Author Topic: Reverse engineering Anlogic AL3_10 FPGA  (Read 12963 times)

0 Members and 1 Guest are viewing this topic.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #25 on: September 10, 2022, 10:02:53 am »
Looked at the clock system and it is still a mystery. Did manage to get the used PLL settings. Had to manually tweak the ICP_CURRENT setting to get the setting bits to match. No idea what this does though.

The PLL is set to just generate a single 200MHz clock. I now have to figure out what is connected to this clock. It is a step by step process of just creating a verilog design and compare the result with the original bit stream. This is also needed to get a better understanding of the global clock routing.

I guess it should be possible to generate gate level verilog from what I have found so far, but that would still need to be translated into a more readable design anyway. So back to generating some more lists to get some insight in how the blocks are connected together.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #26 on: September 10, 2022, 03:14:47 pm »
The mystery surrounding the global clock is slowly disappearing. The manual has a section on it and by looking at the specific configuration bits there is some light at the end of the tunnel.

Since in this design PLL0 only outputs one clock which is routed via the left premux it is possible to trace it onto one of the 8 lines going to the center distribution hub. From there on it is still a bit of a puzzle to figure out how the connections are made based on the signal names used in the configuration bit data.

It is also through the premux blocks that internally derived clocks are routed into the global clock network. I have seen two signals being routed this way, and can start to trace those too.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #27 on: September 11, 2022, 06:29:13 pm »
Well I somewhat figured it out. The central global clock multiplexer has some very complicated setup for selecting which signal from the pre-multiplexers is put into the global clock spines.

There are two sets of 6 bits per multiplexer to make the selection. It then depends on the active signal pair on which bits are set.

Code: [Select]
MC1_S_0 (A)*~(B+C+D+E+F+G+H+I+J+K+L+M) 13 WIRE(VPSPX0) ARCVAL(VPSPX0,HTK0) ARCVAL(VPSPX0,HTK6) ARCVAL(VPSPX0,HTK12) ARCVAL(VPSPX0,HTK18) ARCVAL(VPSPX0,HTK24) ARCVAL(VPSPX0,HTK30) ARCVAL(VPSPX0,HTK0) ARCVAL(VPSPX0,HTK6) ARCVAL(VPSPX0,HTK12) ARCVAL(VPSPX0,HTK18) ARCVAL(VPSPX0,HTK24) ARCVAL(VPSPX0,HTK30)
MC1_S_1 A+B+C+D+E+F+G+H+I+J+K+L 12 ARCVAL(VPSPX0,HTK1) ARCVAL(VPSPX0,HTK7) ARCVAL(VPSPX0,HTK13) ARCVAL(VPSPX0,HTK19) ARCVAL(VPSPX0,HTK25) ARCVAL(VPSPX0,HTK31) ARCVAL(VPSPX0,HTK1) ARCVAL(VPSPX0,HTK7) ARCVAL(VPSPX0,HTK13) ARCVAL(VPSPX0,HTK19) ARCVAL(VPSPX0,HTK25) ARCVAL(VPSPX0,HTK31)
MC1_S_2 A+B+C+D+E+F+G+H+I+J+K+L 12 ARCVAL(VPSPX0,HTK2) ARCVAL(VPSPX0,HTK8) ARCVAL(VPSPX0,HTK14) ARCVAL(VPSPX0,HTK20) ARCVAL(VPSPX0,HTK26) ARCVAL(VPSPX0,HTK32) ARCVAL(VPSPX0,HTK2) ARCVAL(VPSPX0,HTK8) ARCVAL(VPSPX0,HTK14) ARCVAL(VPSPX0,HTK20) ARCVAL(VPSPX0,HTK26) ARCVAL(VPSPX0,HTK32)
MC1_S_3 A+B+C+D+E+F+G+H+I+J+K+L 12 ARCVAL(VPSPX0,HTK3) ARCVAL(VPSPX0,HTK9) ARCVAL(VPSPX0,HTK15) ARCVAL(VPSPX0,HTK21) ARCVAL(VPSPX0,HTK27) ARCVAL(VPSPX0,HTK33) ARCVAL(VPSPX0,HTK3) ARCVAL(VPSPX0,HTK9) ARCVAL(VPSPX0,HTK15) ARCVAL(VPSPX0,HTK21) ARCVAL(VPSPX0,HTK27) ARCVAL(VPSPX0,HTK33)
MC1_S_4 A+B+C+D+E+F+G+H+I+J+K+L 12 ARCVAL(VPSPX0,HTK4) ARCVAL(VPSPX0,HTK10) ARCVAL(VPSPX0,HTK16) ARCVAL(VPSPX0,HTK22) ARCVAL(VPSPX0,HTK28) ARCVAL(VPSPX0,PIBCLKB0) ARCVAL(VPSPX0,HTK4) ARCVAL(VPSPX0,HTK10) ARCVAL(VPSPX0,HTK16) ARCVAL(VPSPX0,HTK22) ARCVAL(VPSPX0,HTK28) ARCVAL(VPSPX0,PIBCLKB0)
MC1_S_5 A+B+C+D+E+F+G+H+I+J+K+L 12 ARCVAL(VPSPX0,HTK5) ARCVAL(VPSPX0,HTK11) ARCVAL(VPSPX0,HTK17) ARCVAL(VPSPX0,HTK23) ARCVAL(VPSPX0,HTK29) ARCVAL(VPSPX0,PIBCLKB1) ARCVAL(VPSPX0,HTK5) ARCVAL(VPSPX0,HTK11) ARCVAL(VPSPX0,HTK17) ARCVAL(VPSPX0,HTK23) ARCVAL(VPSPX0,HTK29) ARCVAL(VPSPX0,PIBCLKB1)

MC1_Z_0 (A)*~(B+C+D+E+F+G+H+I+J+K+L+M) 13 WIRE(VPSPX0) ARCVAL(VPSPX0,HTK5) ARCVAL(VPSPX0,HTK4) ARCVAL(VPSPX0,HTK3) ARCVAL(VPSPX0,HTK2) ARCVAL(VPSPX0,HTK1) ARCVAL(VPSPX0,HTK0) ARCVAL(VPSPX0,HTK5) ARCVAL(VPSPX0,HTK4) ARCVAL(VPSPX0,HTK3) ARCVAL(VPSPX0,HTK2) ARCVAL(VPSPX0,HTK1) ARCVAL(VPSPX0,HTK0)
MC1_Z_1 A+B+C+D+E+F+G+H+I+J+K+L 12 ARCVAL(VPSPX0,HTK11) ARCVAL(VPSPX0,HTK10) ARCVAL(VPSPX0,HTK9) ARCVAL(VPSPX0,HTK8) ARCVAL(VPSPX0,HTK7) ARCVAL(VPSPX0,HTK6) ARCVAL(VPSPX0,HTK11) ARCVAL(VPSPX0,HTK10) ARCVAL(VPSPX0,HTK9) ARCVAL(VPSPX0,HTK8) ARCVAL(VPSPX0,HTK7) ARCVAL(VPSPX0,HTK6)
MC1_Z_2 A+B+C+D+E+F+G+H+I+J+K+L 12 ARCVAL(VPSPX0,HTK17) ARCVAL(VPSPX0,HTK16) ARCVAL(VPSPX0,HTK15) ARCVAL(VPSPX0,HTK14) ARCVAL(VPSPX0,HTK13) ARCVAL(VPSPX0,HTK12) ARCVAL(VPSPX0,HTK17) ARCVAL(VPSPX0,HTK16) ARCVAL(VPSPX0,HTK15) ARCVAL(VPSPX0,HTK14) ARCVAL(VPSPX0,HTK13) ARCVAL(VPSPX0,HTK12)
MC1_Z_3 A+B+C+D+E+F+G+H+I+J+K+L 12 ARCVAL(VPSPX0,HTK23) ARCVAL(VPSPX0,HTK22) ARCVAL(VPSPX0,HTK21) ARCVAL(VPSPX0,HTK20) ARCVAL(VPSPX0,HTK19) ARCVAL(VPSPX0,HTK18) ARCVAL(VPSPX0,HTK23) ARCVAL(VPSPX0,HTK22) ARCVAL(VPSPX0,HTK21) ARCVAL(VPSPX0,HTK20) ARCVAL(VPSPX0,HTK19) ARCVAL(VPSPX0,HTK18)
MC1_Z_4 A+B+C+D+E+F+G+H+I+J+K+L 12 ARCVAL(VPSPX0,HTK29) ARCVAL(VPSPX0,HTK28) ARCVAL(VPSPX0,HTK27) ARCVAL(VPSPX0,HTK26) ARCVAL(VPSPX0,HTK25) ARCVAL(VPSPX0,HTK24) ARCVAL(VPSPX0,HTK29) ARCVAL(VPSPX0,HTK28) ARCVAL(VPSPX0,HTK27) ARCVAL(VPSPX0,HTK26) ARCVAL(VPSPX0,HTK25) ARCVAL(VPSPX0,HTK24)
MC1_Z_5 A+B+C+D+E+F+G+H+I+J+K+L 12 ARCVAL(VPSPX0,PIBCLKB1) ARCVAL(VPSPX0,PIBCLKB0) ARCVAL(VPSPX0,HTK33) ARCVAL(VPSPX0,HTK32) ARCVAL(VPSPX0,HTK31) ARCVAL(VPSPX0,HTK30) ARCVAL(VPSPX0,PIBCLKB1) ARCVAL(VPSPX0,PIBCLKB0) ARCVAL(VPSPX0,HTK33) ARCVAL(VPSPX0,HTK32) ARCVAL(VPSPX0,HTK31) ARCVAL(VPSPX0,HTK30)

The above shows the sets for the wire VPSPX0. Only one pair of ARCVAL signals can be active. A bit of a thing is that in each line the first six ARCVAL pairs are repeated in the next six. This might have to do with FPGA's with a bigger density having more global clock lines, but I'm not sure on that.

I now have to write some code to do a reverse lookup from the bits set to find which HTK or PIBCLKB signal is active.

Another thing I have to modify in the code I made is that the global clocks are not FPGA global, but half FPGA global. For instance GCLK_0 can be a different signal in either half. Despite the fact that the block diagram of the clock systems shows quadrants and there are four central clock multiplexers, the first set of 8 vertical lines is multiplexed by the lower central multiplexer and the next 8 vertical lines are multiplexed by the upper central multiplexer.

Lots of puzzles still to be solved.


Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #28 on: September 14, 2022, 08:43:46 am »
The next hurdle is going to take a lot of experiments. Figuring out the configuration of the logic slices. The lookup table it self is simple, but the flip flop part is the more tricky bit. Also a lslice has more functionality in the sense that it can do two 4 luts or one 5 lut, which is controlled by settings bits, but then the question is how the outputs of the luts are routed onto which output of the lslice.

Another thing to figure out is how the connections of the carry bits are done. The slices have a dedicated connection bus between them, that does not show in the global topology routing through the switch boxes.

For the global clocks there still is the question about what the premux default connection is. In the design I found that the top and bottom premux have signals on the input, but no setting bits for the multiplexer nor enable bits for the wires, but within the ctmux there are connections for these signals.

Again the story continues.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #29 on: September 17, 2022, 01:37:41 pm »
Did a fair bit of experimenting with the Tang Dynasty IDE to get more insight in the logic blocks and decided to generate gate level verilog first. The IDE can make gate level verilog for simulation and together with a file containing their own macros I'm able to match the used setting bits to the parameters of the macros.

Found out that I can't do this for the pins without loosing the assignment to the actual pins, so have to use the pin constraints file and some higher level verilog to make it work.

The reason for taking this path is that I can then generate a new bit stream based on the generated gate level verilog and see if it comes out the same, and when not, tweak things to get it right. Had to do this for one of the experiments I did, which showed me that the pin out gets lost, but also that some default parameters lead to unwanted bits in the configuration.

The code I'm writing will not be an universal tool for reverse engineering and AL3-10 design. It is custom written to fit the bill for the FNIRSI 1013D and maybe 1014D designs. Things like DSP blocks or special pin configurations like LVDS are not handled by the code.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #30 on: September 18, 2022, 06:50:24 pm »
Today I managed to generate the pin constraint file and the main module with the external signal names.

The code recognizes the bits for slew rate and pull type and can determine if a pin is set for input or output when not connected to any visible routes. This is the case for the clock input pin. It connects directly to the pll, but I can't tell which bits are set to make this work.

For now I will manually configure the global clock connections because I already examined what these connections are.

Next up is generating code for the logic slices. I will have to test with several scenarios if the correct code is generated.

I do feel things are on the right track, but still have a lot to do before proper top level verilog is obtained, because gate level verilog is not very useful, other then giving the ability to re generate the bit stream. Making modifications to the design on that level is not really doable.

It does help with interpreting the setting bits though.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #31 on: September 19, 2022, 12:04:29 pm »
Hmm, hit a bit of a snafu |O

Simply translating the logic blocks in either AL_PHY_MSLICE or AL_PHY_LSLICE macro does not lead to the same bit stream when run through the IDE again. There are quite a bit of different functions a slice can have and to get the exact same bit stream the logic has to be mapped to the correct functional macro.

For instance when a simple two input AND gate is intended, an AL_MAP_LUT2 is needed and the logic equation needs to be given like (B*A). This means that the lookup table has to be examined together with the used inputs to find the correct equation. Not to hard for a two input function, but more so when it is a 5 or even 6 input one.

Getting the bits set in the lookup table is easy and already implemented. Now I have to write a lot of code to determine the type of macro to use for the slice or part of the slice, and then, when needed, make up the correct equation for it.

A well, it is all part of the adventure I embarked on :)

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #32 on: September 23, 2022, 12:02:31 pm »
There is still  lot of work to be done, but I'm getting some output.

I made a simple design in verilog to see what the different types of logic would do in the setting bits. With the IDE it is possible to make gate level verilog for simulation, which gives a good insight in to what I need to produce.

The verilog used to make the bit stream.
Code: [Select]
module pin_test
(
  input  wire i_led_red_control,
  input  wire i_led_blue_control, 
  input  wire i_led_yellow_control,
  output wire o_led_red, 
  output wire o_led_blue, 
  output wire o_led_green,
  output wire o_led_yellow
);

reg ireg_red_led = 0;

always@(posedge i_led_red_control)
  begin
    ireg_red_led <= ~(i_led_blue_control | i_led_yellow_control);
  end

assign o_led_yellow = (~i_led_yellow_control & (i_led_red_control & i_led_blue_control)) | (i_led_yellow_control & (i_led_red_control ^ i_led_blue_control));
assign o_led_red    = i_led_red_control | i_led_blue_control;
assign o_led_blue   = i_led_red_control ^ i_led_blue_control;

assign o_led_green = ireg_red_led;

endmodule

The gate level output from the IDE
Code: [Select]
// Verilog netlist created by TD v5.0.28716
// Thu Sep 22 20:04:13 2022

`timescale 1ns / 1ps
module pin_test  // pin_test.v(1)
  (
  i_led_blue_control,
  i_led_red_control,
  i_led_yellow_control,
  o_led_blue,
  o_led_green,
  o_led_red,
  o_led_yellow
  );

  input i_led_blue_control;  // pin_test.v(4)
  input i_led_red_control;  // pin_test.v(3)
  input i_led_yellow_control;  // pin_test.v(5)
  output o_led_blue;  // pin_test.v(7)
  output o_led_green;  // pin_test.v(8)
  output o_led_red;  // pin_test.v(6)
  output o_led_yellow;  // pin_test.v(9)

  wire i_led_blue_control_pad;  // pin_test.v(4)
  wire i_led_red_control_pad;  // pin_test.v(3)
  wire i_led_yellow_control_pad;  // pin_test.v(5)
  wire n1;
  wire o_led_blue_pad;  // pin_test.v(7)
  wire o_led_green_pad;  // pin_test.v(8)
  wire o_led_red_pad;  // pin_test.v(6)
  wire o_led_yellow_pad;  // pin_test.v(9)

  AL_PHY_PAD #(
    //.LOCATION("P88"),
    //.PCICLAMP("OFF"),
    //.PULLMODE("NONE"),
    .IOTYPE("LVCMOS33"),
    .MODE("IN"),
    .TSMUX("1"))
    _al_u0 (
    .ipad(i_led_blue_control),
    .di(i_led_blue_control_pad));  // pin_test.v(4)
  AL_PHY_PAD #(
    //.LOCATION("P28"),
    //.PCICLAMP("OFF"),
    //.PULLMODE("NONE"),
    .IOTYPE("LVCMOS33"),
    .MODE("IN"),
    .TSMUX("1"))
    _al_u1 (
    .ipad(i_led_red_control),
    .di(i_led_red_control_pad));  // pin_test.v(3)
  AL_MAP_LUT2 #(
    .EQN("(~B*~A)"),
    .INIT(4'h1))
    _al_u10 (
    .a(i_led_blue_control_pad),
    .b(i_led_yellow_control_pad),
    .o(n1));
  AL_PHY_PAD #(
    //.LOCATION("P31"),
    //.PCICLAMP("OFF"),
    //.PULLMODE("NONE"),
    .IOTYPE("LVCMOS33"),
    .MODE("IN"),
    .TSMUX("1"))
    _al_u2 (
    .ipad(i_led_yellow_control),
    .di(i_led_yellow_control_pad));  // pin_test.v(5)
  AL_PHY_PAD #(
    //.LOCATION("P49"),
    //.PCICLAMP("OFF"),
    //.PULLMODE("PULLDOWN"),
    //.SLEWRATE("MED"),
    .DRIVE("8"),
    .IOTYPE("LVCMOS33"),
    .MODE("OUT"),
    .TSMUX("0"))
    _al_u3 (
    .otrue(o_led_blue_pad),
    .opad(o_led_blue));  // pin_test.v(7)
  AL_PHY_PAD #(
    //.LOCATION("P39"),
    //.PCICLAMP("OFF"),
    //.PULLMODE("NONE"),
    //.SLEWRATE("SLOW"),
    .DRIVE("8"),
    .IOTYPE("LVCMOS33"),
    .MODE("OUT"),
    .TSMUX("0"))
    _al_u4 (
    .otrue(o_led_green_pad),
    .opad(o_led_green));  // pin_test.v(8)
  AL_PHY_PAD #(
    //.LOCATION("P112"),
    //.PCICLAMP("OFF"),
    //.PULLMODE("KEEPER"),
    //.SLEWRATE("FAST"),
    .DRIVE("8"),
    .IOTYPE("LVCMOS33"),
    .MODE("OUT"),
    .TSMUX("0"))
    _al_u5 (
    .otrue(o_led_red_pad),
    .opad(o_led_red));  // pin_test.v(6)
  AL_PHY_PAD #(
    //.LOCATION("P34"),
    //.PCICLAMP("OFF"),
    //.PULLMODE("NONE"),
    //.SLEWRATE("SLOW"),
    .DRIVE("8"),
    .IOTYPE("LVCMOS33"),
    .MODE("OUT"),
    .TSMUX("0"))
    _al_u6 (
    .otrue(o_led_yellow_pad),
    .opad(o_led_yellow));  // pin_test.v(9)
  AL_MAP_LUT2 #(
    .EQN("~(~B*~A)"),
    .INIT(4'he))
    _al_u7 (
    .a(i_led_blue_control_pad),
    .b(i_led_red_control_pad),
    .o(o_led_red_pad));
  AL_MAP_LUT2 #(
    .EQN("(B@A)"),
    .INIT(4'h6))
    _al_u8 (
    .a(i_led_blue_control_pad),
    .b(i_led_red_control_pad),
    .o(o_led_blue_pad));
  AL_MAP_LUT3 #(
    .EQN("(A*B*~(C)+A*~(B)*C+~(A)*B*C)"),
    .INIT(8'h68))
    _al_u9 (
    .a(i_led_blue_control_pad),
    .b(i_led_red_control_pad),
    .c(i_led_yellow_control_pad),
    .o(o_led_yellow_pad));
  AL_MAP_SEQ #(
    .CEMUX("1"),
    .CLKMUX("CLK"),
    .DFFMODE("FF"),
    .REGSET("RESET"),
    .SRMODE("ASYNC"),
    .SRMUX("0"))
    ireg_red_led_reg (
    .clk(i_led_red_control_pad),
    .d(n1),
    .q(o_led_green_pad));  // pin_test.v(14)
endmodule

What comes back from my code so far.
Code: [Select]
module pin_test
(
  output wire o_led_yellow,
  input wire i_led_yellow_control,
  input wire i_led_red_control,
  output wire o_led_green,
  output wire o_led_blue,
  output wire o_led_red,
  input wire i_led_blue_control
);

  AL_MAP_LUT2 #
  (
    .EQN("B@A")
  )
  _al_block_18_lut_0
  (
    .a(x0y10_pin_28_di_net_3),
    .b(x34y16_pin_88_di_net_8),
    .o(x10y5_mslice1_f_0_net_6)
  );

  AL_MAP_LUT2 #
  (
    .EQN("B+A")
  )
  _al_block_23_lut_0
  (
    .a(x0y10_pin_28_di_net_3),
    .b(x34y16_pin_88_di_net_8),
    .o(x15y13_mslice1_f_0_net_7)
  );

endmodule

The pin constraint file is fully generated, so the AL_PHY_PAD macros are not needed and would give problems with the pin assignment as written before. But I do need to list the wires and output assigns to connect the internally used wires to the external wires.

The target for now is to get these "simple" logic parts generated and check if it produces the same bit stream as the original.

For the equations I'm using a lookup table for two inputs and for more inputs I will make combinations based on NOT, AND and OR logic.

The fact that there are two LUT's and flip flops per slice makes it more complex to derive all the needed verilog marcos from the settings, but with a step by step approach I will get there. But it involves a lot of probing a black box, due to the lack of documentation. But that is something the FPGA manufactures don't want out in the world. In depth information about their products :(

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #33 on: September 25, 2022, 11:51:02 am »
A first test showed some success. The two files generated so far (gate level verilog and pin constraint) allow me to generate a new bit stream, but for some reason the router swaps some of the logic between the two available parts in a slice.

It uses the same tiles and slices, but instead of using the flip flop connected to LUT0 it uses the one connected to LUT1 to make up the register from the original code. It sets the same number of bits to on and when I run the bit stream through my code again it looks the same, except for the swap of the two LUT's and flip flop. :palm:

I now have to see if this is also going to be the case with the actual FNIRSI-1013D bit stream, but before I can run that I need to implement a lot more code, and run lots of other tests.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #34 on: September 25, 2022, 02:10:53 pm »
Managed to solve the LUT and flipflop swapping. Had to do with the naming of the AL_MAP_SEQ macro :o

The AL_MAP_LUT macros use names like _al_ and the AL_MAP_SEQ macro uses the name of the register given in the original design. This name is of course unknown, but when I name it starting with reg_ instead of _al_ it fixes the swapping problem.

It was after this that I noticed another problem. The routing through the fabric was different and a better look at the generated verilog showed that signals where swapped on LUT inputs.

This is caused by the way I mapped the signals onto the AL_MAP_LUT macros, which seems to be the wrong way round. I assumed that the first input starting from the fabric input "a" would be the "a" input of the macro, but the output of my code shows that it is then swapped against to what the IDE generates for the gate level simulation. This means that the highest input either "mi", "e", or "d" depending on the slice and LUT configuration is the "a" input of the AL_MAP_LUT macro.

So some modification of the code is needed.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #35 on: October 06, 2022, 12:20:18 pm »
After a holiday break back on the project. By the looks of it getting a gate level verilog that will yield the exact same bit stream seems to be hard to accomplish. Had to make more tweaks after implementing the LUT3 macro and was able to get the exact same output from the original verilog and the generated gate level verilog.

But then after implementing the LUT4 macro it failed again. Found that it had to do with the naming of the macros and was able to get it back on track, but now after implementing the LUT5 macro it is of again :o

So will not try to get it to match, but will continue implementing the logic for generating the gate level verilog. Even if it does not yield the exact same bit stream it is still possible to verify the result by running the new bit stream through the reversal code and check the outcome against the first run.

As long as it is just routing that changes and not the logic equations the functionality should be the same. :)

The mapping of the inputs on the lookup tables is different to what I wrote before. For 1, 2 and 3 input LUT's it uses dedicated mappings, but for the 4 and 5 input LUT's there is a 1 to 1 relation between the signals.

At least it is all taking shape and even though still a lot to do, it is leading to a satisfactory result.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #36 on: October 10, 2022, 10:04:22 am »
It involves a lot of research through all the data to get to the bottom of what the different settings for the logic blocks do. Some are also hard to verify due to the fact that the IDE has a will of its own. For instance it won't create a LUT6 using one slice even when I specifically provide the LUT6 macro. It just splits it up in two or more separate LUT's spread over different slices.

Another one that is hard to verify is a LUT5 made within a single mslice. The bit stream of the original 1013D FPGA seems to have these in it, and another design I have in actual verilog source of, does use it to, and it took some thinking and looking at the data to get an understanding of it. The easiest way to deal with it in my code is to create 3 LUT macros though.

The two LUT4's of the mslice get the same signals on their inputs, but not always in the same order, as can be seen in the attached pictures. This is why I now create two LUT4 macros and a LUT3 macro to select between the signals of the other two based on the "mi_0" input.

Then there is the fact that slice 0 of every logic tile differs from slice 1, because there are some settings bit missing for this slice. Where slice 1 seems to route the "mi" input to the flip flop input by default, slice 0 has the "f" output routed to the flip flop and only switches to the "fx" output when the FX MUX is turned on. :palm:

If someone is ever going to make the open source place and route software for these devices they have a lot of stuff to figure out 8)


Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #37 on: October 12, 2022, 10:22:18 am »
I think I have the MSLICE handling down. Did some random checks on what is in the generated gate level verilog and the listed settings in the block list and it looks like it is matching up.

A full test by running it via the Tang Dynasty IDE is not possible due to the clock part not being done yet, and most likely a lot of signals are not connected since the rest of the logic is not yet generated. Did some tests with a simple 4 bit adder and registers to iron out the kinks though.

The LSLICE has more abilities so will be a bit more work, but with the increased knowledge gathered from doing the MSLICE it should go faster.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #38 on: October 13, 2022, 11:12:23 am »
The LSLICE was less work then I thought. Just implemented what is used in the 1013D bit stream, so not suitable for full reverse engineering of every bit stream out there.

For the PLL I already did some exploring and have the verilog IP for it, which I will manually copy into the gate level verilog file I'm generating, so what is left is the block ram. This means looking through the settings bits to generate the proper settings for the parameters of the block ram macro, and connect all the signals to the blocks.

A fair bit of coding but not to complex.

And when that is done the big question is, if what I generate compiles into a bit stream that does what the original bit stream does.

If so it will then need translation to more readable verilog, which will not be that simple, but will cross that bridge when I get to it :)

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #39 on: October 15, 2022, 12:14:05 pm »
Well I managed to get output that can be compiled with the IDE, but if it is correct needs to be verified.

Due to the fact that I manually did the clock signals I can't just run my reverse engineering program on the newly generated bit stream and expect it to be able to compile again. When the clock generation blocks are moved to different tiles the names I have fixed in the code will not be the correct ones :(

But a quick inspection showed me that the newly generated bit stream uses about 3000 bits less, and I'm not sure if this is just due to a different placement and routing.

One way to test of course is to load it into the actual scope and see what it does. So fingers crossed and hope it does not blow up :palm:

I will do a bit more checking of what is generated though.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #40 on: October 15, 2022, 01:21:43 pm »
A quick look at the settings bits and the block connections showed me that the global clocks are not used for some reason. That certainly explains the missing 3000 settings bits |O

So more testing of the gate level verilog generation is needed to see if I can solve this. No need to test the bit stream I have now in the actual scope, because without clocks it will not work.

Edit: It was a case of case :palm: I used "GCLK" in my manually assigned names and the rest used "gclk". The bit count went up with >2000, so still a bit of a difference and also the net list is longer. More work to be done 8)

Edit2: Tried the generated bit stream on the scope but somewhat expected, it did not work. When I upload the original the scope stops working during upload and needs a reset to come back up again, but it does work. With the new bit stream it does not come back after the reset. Definitely more work to be done. But I do have some ideas based on new knowledge gained.
« Last Edit: October 15, 2022, 04:34:42 pm by pcprogrammer »
 
The following users thanked this post: tv84

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #41 on: October 16, 2022, 02:26:08 pm »
"What a mistaka to maka"  :o

I found out today that I forgot to cater for the bidirectional pins. I'm handling input and output pins but am not making the needed connections for the bidirectional pins, and these are needed for the interface with the MCU :palm:

As another experiment I changed the code to just use the SLICE macros because these can represent all the logic made with one block and is easier then creating the LUT macros, and the resulting output did compile in the IDE, but still no dice on the scope. It was also ~2500 setting bits less then the original. So I started looking at the net list, block list and setup list again and noticed that the bidirectional pins had changed into output pins. |O

So another change of the code is needed and hopefully it will then bring success.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #42 on: October 16, 2022, 03:52:25 pm »
Aaaaaaaargh so close, but still no dice on the scope.

The net list of the newly generated bit stream now has one net more then the original, and the same goes for the block list. It has one extra block :o

Due to different routing and placing it still has less bits set, but not sure if that can cause it to not work. Routing makes up for a fair number of the used bits. But maybe some timing constraints that are not met makes it fail?

Or I made an error in the clock connections. The design uses 6 global clocks and it is still not completely clear how these clocks are connected to the global routes.

Ah well, more digging to be done.

Offline cedric!

  • Contributor
  • Posts: 31
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #43 on: October 16, 2022, 06:58:23 pm »
I have no knowledge of Anlogic FPGA's, but there are other projects [1] that are reverse engineering FPGA bitstreams. They have projects for Lattice iCE40[2], Lattice ECP5[3], Xilinx 7[4], Xilinx Ultrascale, Ultrascale+ and UltraScale MPSoC [5]

The selling point of those projects is that they have documented the process of reverse engineering. Maybe you can (ask to) create a reverse engineering project for Anlogic FPGA's?

[1] https://f4pga.org/
[2] https://github.com/F4PGA/icestorm
[3] https://github.com/F4PGA/prjtrellis
[4] https://github.com/F4PGA/prjxray
[5] https://github.com/f4pga/prjuray




 

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #44 on: October 16, 2022, 07:16:27 pm »
Hi cediric!, thanks for the input, but I'm aware of these kind of projects, which are targeted to reverse engineering of the process of generating a bit stream with open source place and route software.

There is a project for the Anlogic FPGA's based on prjtrellis. (See my first post at the start of this topic) I have used that to get the listings of what a setting bit means and it was certainly helpful, but I still had to figure out a hell of a lot myself. The same applies to Gowin. It was actually the Gowin project that provided some needed information to be able to continue with my project.

A bit of a problem with these somewhat experimental projects is that they are never finished, like only fit for a single type of the manufacturers FPGA line, or not supporting specific IP like embedded ram, etc. The same is true for what I'm doing. It is just targeted at the FNIRSI 1013D FPGA bit stream.

What I'm trying to do is to reverse engineer an existing bit stream to verilog, which is a whole other ballgame. As you can read in this thread, all the data I found from the mentioned open source project for the Anlogic FPGA's is for going from verilog to a bit stream and not the other way round.

I'm actually quite close to a working result now.



Offline cedric!

  • Contributor
  • Posts: 31
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #45 on: October 16, 2022, 07:20:35 pm »
There is a project for the Anlogic FPGA's based on prjtrellis. (See my first post at the start of this topic)

Sorry, I missed that.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14440
  • Country: fr
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #46 on: October 16, 2022, 07:45:17 pm »
Does that imply that Anlogic FPGAs have an architecture very close to that of Lattice ones?
Isn't that more or less the case for Gowin FPGAs too?
 

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #47 on: October 17, 2022, 05:15:43 am »
I'm not familiar with the Lattice devices, but if they are similar to the Gowin devices then probably yes.

The work done on the Gowin devices by Pepijn de Vos (https://github.com/YosysHQ/apicula) showed me how the routing through the chip is done. If the rest of the setup is the same I don't know yet.

But the principles for reverse engineering a bit stream back to verilog will be the same. Get to know the meaning of the bits, derive a net list from them, create a list with the logic settings, like if a slice is an adder or an embedded memory block is a fifo and turn that into gate level verilog.

The step of turning it into usable verilog is one I still have to take. The gate level stuff is ok to figure out bits about the working of the design, but not usable for making changes.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #48 on: October 17, 2022, 08:47:22 am »
I have tried playing with the settings of the IDE and the two possible modes of doing tri state on the MCU interface pins, but for some reason it just fails to do what I want.

I tried loading all the generated bit streams to the scope.

Code: [Select]
  assign io_mcu_d_0 = x12y23_lslice3_f_0_net_481 ? 1'bZ : x11y20_lslice3_q_0_net_365;
  assign io_mcu_d_1 = x12y23_lslice3_f_0_net_481 ? 1'bZ : x11y18_lslice3_q_0_net_357;
  assign io_mcu_d_2 = x12y23_lslice3_f_0_net_481 ? 1'bZ : x13y18_lslice2_q_0_net_551;
  assign io_mcu_d_3 = x12y23_lslice3_f_0_net_481 ? 1'bZ : x11y15_lslice3_q_0_net_344;
  assign io_mcu_d_4 = x12y23_lslice3_f_0_net_481 ? 1'bZ : x13y19_mslice1_q_0_net_557;
  assign io_mcu_d_5 = x12y23_lslice3_f_0_net_481 ? 1'bZ : x11y18_mslice1_q_0_net_358;
  assign io_mcu_d_6 = x12y23_lslice3_f_0_net_481 ? 1'bZ : x11y20_mslice1_q_0_net_362;
  assign io_mcu_d_7 = x12y23_lslice3_f_0_net_481 ? 1'bZ : x12y16_mslice1_q_0_net_438;

With the above the scope stays black on the screen after a reset.
With the below the scope comes back to live after a reset, but is then frozen. This might well be due to the fact that bit 0 is used to detect if the FPGA is finished with acquisition 8)

Code: [Select]
  assign io_mcu_d_0 = x12y23_lslice3_f_0_net_481 ? x11y20_lslice3_q_0_net_365 : 1'bZ;
  assign io_mcu_d_1 = x12y23_lslice3_f_0_net_481 ? x11y18_lslice3_q_0_net_357 : 1'bZ;
  assign io_mcu_d_2 = x12y23_lslice3_f_0_net_481 ? x13y18_lslice2_q_0_net_551 : 1'bZ;
  assign io_mcu_d_3 = x12y23_lslice3_f_0_net_481 ? x11y15_lslice3_q_0_net_344 : 1'bZ;
  assign io_mcu_d_4 = x12y23_lslice3_f_0_net_481 ? x13y19_mslice1_q_0_net_557 : 1'bZ;
  assign io_mcu_d_5 = x12y23_lslice3_f_0_net_481 ? x11y18_mslice1_q_0_net_358 : 1'bZ;
  assign io_mcu_d_6 = x12y23_lslice3_f_0_net_481 ? x11y20_mslice1_q_0_net_362 : 1'bZ;
  assign io_mcu_d_7 = x12y23_lslice3_f_0_net_481 ? x12y16_mslice1_q_0_net_438 : 1'bZ;

Research on the latter shows that the IDE just removes the select signal on io_mcu_d_0 and io_mcu_d_7, but connects it for the remaining 6 signals. The question is of course why the IDE does this.

All the testing I did shows that it is possible to generate gate level code that compiles into the exact same bit stream, but it has to adhere to some weird rules that don't apply all the time. Like the naming of the macros mattered in getting it right, but then failed again in a next experiment.

But I gained a lot of knowledge about it all, and I need to take another step to get it done. Work it back to some proper verilog. This means analyzing the generated gate level verilog in some way. One idea I started with earlier needs a revisit. Making a tree list of the blocks by tracing io signals. Since I know the external connections it is possible to identify different sections of the code based on this idea, and use it to write up the needed proper verilog.

Already identified the PLL and embedded memory parts and have the IP for it.

It would have been nice if the gate level code resulted in a working bit stream, because that would have proofed the premise.

And so the story continues  :popcorn:

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 3690
  • Country: nl
Re: Reverse engineering Anlogic AL3_10 FPGA
« Reply #49 on: October 19, 2022, 09:17:08 am »
For some reason the IDE just does not handle the gate level code that well.

I have tried a bit more things, but it constantly refuses to connect a couple of the enable pins on the tri state data bus. For the latest version it was d1 and d6. Showed the same result on the scope. It sits just frozen on a normal scope screen.

Therefore I moved on to do a manual conversion to the top level verilog. It is a bit like going back from assembler to C, but more difficult since it is a lot of parallel "code" to interpret.

I searched the net a bit, but did not find anything useful for it.

Guess it is good exercise in getting more knowledge of verilog. Maybe gain some insights in how it could be automated.


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf