Macros, in this case, are obfuscators.
Nope. -sigh- This is always the case when trying to explain something on a forum... i try to give the minimum info so it would be clear what the intent is. But then it is misunderstood.
This has nothing to do with obfuscation, on the contrary. It is intended to make it easier to create state machines and logic that can sync up .
You make definitions for all kinds of events and items that need to happen at certain timestamps. in essence you are creating a language in which you describe the machine and what it does, what it waits for , how long , what it syncs with and so on.
these 'keywords' are then defined as verilog backticks ( `define ....)
When reading the machine description you don't need to wade through a bunch of complicated syntax. It is an english text that describes what happens. The underlying verilog can be complex and be a block of separate modules.
case revolution_state
case `index
`start_revtimer;
`wait_for_sector_sync;
case `first_sector;
`get_sectorcount;
`current_sector <=1;
`start_sector_handler;
`wait_for_sector_sync;
case `last_sector
blabla...
default `wait_for_revolution_sync;
If this machines flow needs to change it is easy to understand where to alter the code.
In semiconductor design many systems are built like that. They use TCL or some other scripting tool to describe the desired system in a macro language. that then gets translated into instantiated block of verilog that are full of such `defines. the low-level designers make the primitive blocks needed and the `define headers. The rest is driven from outside the verilog world.
Nobody reads the actual 'generated' verilog code. A module is designed, simulated and done. How the modules fit together and interact with each other : that verilog is never seen by humans as it is generated by another tool.
I had over 300 control registers that needed mapping into dual port memory or hard registers. One side goes to the cpu , the other is hardware. I made a software tool where i can describe the registers and their layout and it would generate the verilog, the c code for the ARM , the C code for the 8051 , the transport code for the USB packet handler and the Visual basic code to throw visualize the data.
example
reg 0 r/w ,[15:14] state, [13] ready,[12:0] version;
reg 1 w, [15:13] 0, [12:0] desired_speed;
reg 1 r, [15:13] 0, [12:0] current_speed;
This would define the registers in the fpga and map them through an address decoder.
On the ARM you got a variable defined hard cast at adresses that map into the FPGA
In the arm code i could just write desired_speed = current_speed +1; since both are hard mapped there is no 'i/o code'
In visual basic an object was created called current_speed. if you 'read the object' a call us done over usb to a c function on the 8051 that picks up the data from the dpram in the fpga and returns it.
same thing with desired_speed. all that stuff was totally handled 'under the hood'.
When a new chip came along all i had to do was write the layout of the registers . write-only registers were automatically 'shadowed' in the fpga so neither ARM nor 8051 nor pc had to do any of that.
The realtime code on the arm was a short program to spin up the motor and perform seek operations. That would have been impossible due to latency and command jitter in usb. So the ARM did those things in lock-step with assistance from the FPGA.
the 8051 code was just i/o transport into the registers and dpram. That was hardcoded and never needed altering as the 'mapping' was handled by the fpga.
Basically the top level program did not deal with registers. If a register contained mulitple bit fields , those all became individual 'variables' or 'objects' that could be read and written. The FPGA verilog made sure to 'assemble' all the required bits and pieces to form the true 'register' content and blast that of into real control register. The software did not have to do and/or masking to alter 1 bit in a register. The bit was accessible as a boolean.
Development of the platform went from months to half a day. Spend an hour making the table, send it through the compilers and while waiting for synthesis ,write the UI and ARM code. The verilog took 3 hours to synthesize. It filled the largest Cyclone-5 to 80% capacity . I think that was a 35 million gate device. on some large SOC with hundreds of registers and thousands of variable the verilog was hundreds of thousand of lines)
The tool create a .h file for the arm , a .h file for the 8051 a .vb file for the user interface and a .v file defining all the registers and the access code for them. In the FPGA the logic would 'trap' a write operation to a variable and fire of the serial protocol handler to shift out the entire register contents into the SOC. The protocol handler was also 'programmable' and the protocol had its own description language.
The user interface on the pc used custom control i designed that would lock onto the 'variable' objects. Building the UI was a matter of hours.
The whole system had the processors run in lockstep with the FPGA. The fpga generated the clock for the cpu's and it controlled the bus cycles. The fpga emulated the ram and rom for the ARM so the processor did not know that behind a memory location there was actually an enormous block of logic that disassembled and reassembled registers.
Why not do this in software ? because the hardware can do it much faster !
Think about it this way :
you have an SPI bus accessing a 16 bit register. this register has 3 fields. bits 15 to 11 , bits 10 to 6 and 4 to 0. called x, y and z
you need to write a piece of code on the cpu that does the following
x = ((y+3) *z)/32
So you need to call an spi handler to read the register , then you need to unpack the bitfields to create 3 integers , then you do the calculation , then you need to repack because you need to leave Y and Z intact but only need to update the bits 15 to 11 (x) and then you need to do another spi operation.
My system : i didn't do diddly squat. X Y and Z are all variables hard located somewhere in 'ram' of the processor. The Fpga knows they are each 5 bit but it gives these to me as three int32 with the highest bits already set to 0 so all the bit masking and shifting is handled in hardware. actually it is not even 'shifting' as all this stuff is done in 1 clocktick.
reg something [15:0]
`define x_read data_out[15:0] = {[15:5] 0 , something[15:11]}
`define y_read data_out[15:0] = {[15:5] 0 , something[10:5]}
`define z_read data_out[15:0] = {[15:5] 0 , something[4:0]}
`define x_write something[15:11] <=data_in[4:0];
always_reg @posedge clk
if (write_event) begin
case address
variable_x : x_write;
.....
case read_cycle :
- perform SPI access to fetch register contents
always_comb read_system begin
case read_address
variable_x : x_read;
variable_y : y_read;
The fpga runs the bus cycle of the processor. When it sees a read event come in it knows the target address. The cpu will only pick up the data 8 clockticks later. ( Hard defined by the ARM bus cycle. You could set the cycle independently for read and write. Write was single cycle , but read was 8 clock ticks. The arm bus clock is 50MHz (generated by the FPGA in sync with its masterclock) . The FPGAs internal clock is 500MHz.... that gave me 80 FPGA clockticks... i need 8 ticks to send the address , 1 tick for the read bit , 1 wait tick , and 16 ticks to fetch the data. the spi-like bus clock was 160MHz. So i was guaranteed to be able to do 1 transport within 1 bus cycle of the arm. ) The processor picks up the data and goes on. The CPU is completely unaware that there was a bunch of stuff happening over a serial bus to fetch the data. The return path is one huge combinatorial address decoder that strips the data and returns what is needed directly into the bus format of the cpu. No need on the processor to do and-mask or-mask or shifting. (or things like sign extention or combining 2 registers into 1 variable ( 32 bit number split over 2 16 bit registers)
I also had a dma-like mechanism that would prevent the cpu having to wait for mulitple writes. The fpga would wait until it got a trigger that basically said : send all 'dirty' registers over. the cpu could then do other things.. IF it needed lockstepping those write operations used a different bus-cycle that guaranteed the data leaving before the cpu could execute its next instruction. The Arm-7 has 4 or 8 definable bus cycles and you can control what cycle is used for what memory range. so for data that always needed to leave before continuation you used a slower cycle.
That system has been in use since 2007 and everyone loves it. The generated verilog is'unreadable' by humans. but that is not required anyway.
Whenever you have peripheral blocks with large register sets , each containing multiple fields : abstract it. it is madness to try to code that by hand. it is mindnumbingly boring and if the register map changes during design you are royally screwed. good luck editing thousands and thousands of lines to relocate a variable or a field. The semiconductor design toolkits are full of 'generators'. feed a table and it generates the verilog for you.
Look at the design tools themselves. they come with macro wizards. i need this function. this is in , that is out, this is clock, that and this happens there and then. a is rising , b is falling, and z is asynchronous. . click a button and you end up with a thousand line verilog block. do you ever read that ? no. Would you ever craft that by hand from nothing ? no! that is madness.
I didn't invent this. This is standard practice. It's a macro generator for a very complex SPI-like device with hundreds of registers. It generates the hardware and the 'co-ware ( the arm-code) to interact with it in the , from a development perspective, shortest amount of time.
So what if the generated code is unreadable for humans and uses macros and many other things. you are not supposed to muck in that anyway. if it needs changing : use the top-level tool.
That being said : you can still exploit this by hand for smaller systems. make a 'human-like' language using macro definitions. That makes the flow of the state machine easier to understand than having to read through a bunch of syntax 'tree's that hides the forest. ( that what needs to be done ). I will craft the forest first. the Tree's are small macro's. IF my forest needs to change ( trees further apart, more pines than oaks , a narrower but longer forest ) i can do that in my 'forest description language'. That then gets 'translated' to tree instances.