Author Topic: Verilog automatic one-hot encoding for state names (Read 7097 times)

rstofer · « **on:** January 24, 2022, 03:26:41 am »

In VHDL, I might create a type for state names and just populate it with labels. The tool will automatically assign one-hot values for each state and I never need to deal with the actual value. Of course 'state' and 'next_state' can get pretty wide but that's the cost of not having to decode values.

Code: [Select]

	type state_type is ( s0,  s0a, s0b, s0c,
                             Fetch,  s1a, s1b, s1b1, s1c,
                             s2 , s2a, s2b,
                             s3,  s4,  s5,  s6,
                             s7a, s7b, s7b1,s7c, s7d,

                             ...      

                             MemWr, MemWr1);

[/font]

There might be a lot of states (I left out a LOT!) and manually assigning values in Verilog seems unworkable. So, how is it done in the real world?

I have also been known to define the states in a VHDL package file. Maybe a Verilog 'include' file?

BrianHG · « **Reply #1 on:** January 24, 2022, 03:47:50 am »

Have you just tried using actual 'strings' , IE just name each state to it's actual name?
It can make debugging a complex machine easy to read.
Apparently Modelsim and Quartus (with a few warnings, IE a limit on character lengths) appear to work with strings.

julian1 · « **Reply #2 on:** January 24, 2022, 08:42:30 pm »

verilog enums?

https://verificationguide.com/systemverilog/systemverilog-enum/

Bassman59 · « **Reply #3 on:** January 24, 2022, 08:46:22 pm »

Quote from: rstofer on January 24, 2022, 03:26:41 am

In VHDL, I might create a type for state names and just populate it with labels. The tool will automatically assign one-hot values for each state and I never need to deal with the actual value. Of course 'state' and 'next_state' can get pretty wide but that's the cost of not having to decode values.

System Verilog allows you to create an enumerated type like VHDL has offered since the beginning.

BrianHG · « **Reply #4 on:** January 24, 2022, 09:57:43 pm »

Quote from: julian1 on January 24, 2022, 08:42:30 pm

verilog enums?

https://verificationguide.com/systemverilog/systemverilog-enum/

Learn something new and useful everyday.

Sort of like recently when I found the little ' $bits (packed typedef structure) ' to return the number of wires inside my structures to easily feed one of my structures in-out through a Quartus FIFO using the $bits to set the fifo's width without worrying anytime I add or remove buses inside my typedef feeding the fifo's data-in -> out port.

The $bits() works with any logic, wires, array, structures, and it should also be functional with enum.

asmi · « **Reply #5 on:** January 24, 2022, 10:02:56 pm »

Quote from: BrianHG on January 24, 2022, 09:57:43 pm

Learn something new and useful everyday.

Take a look at this as well: https://verificationguide.com/systemverilog/systemverilog-interface-construct/
I think interfaces is one the most useful feature of SystemVerilog as it allows you to de-clutter many connecting wires and bunch them up into interfaces - kind of like how we bunch up individual wires in real life.

Also you can omit specifying enum base type, leaving it to synthesizer to determine the best encoding method (one-hot, gray code, sequential, johnson), or you can force any particular encoding. Some simulation tools can display enum names directly instead of numerical values, which makes debugging much easier.

BrianHG · « **Reply #6 on:** January 24, 2022, 10:25:50 pm »

Quote from: asmi on January 24, 2022, 10:02:56 pm

Quote from: BrianHG on January 24, 2022, 09:57:43 pm
Learn something new and useful everyday.
Take a look at this as well: https://verificationguide.com/systemverilog/systemverilog-interface-construct/
I think interfaces is one the most useful feature of SystemVerilog as it allows you to de-clutter many connecting wires and bunch them up into interfaces - kind of like how we bunch up individual wires in real life.

In my new Multiwindow VGA controller, I went with a system wide 'typedef struct packed { };' as I had a progression of a pipelined processor, both many in parallel as an an array and to forward along the process of preparing the picture data and applying palettes and transformations where I would not need to maintain a separate train of timed control signals. You can use a typedef as both a input/output port for your interface as well as internally for additional logic as well as arraying your structures when creating an interface IO where you might want to instantiate multiple copies of a piece of code.

rstofer · « **Reply #7 on:** January 25, 2022, 12:32:30 am »

The enum's of SV would probably work as long as I can make the eventual value one-hot instead of integer (or something). I want to avoid the 7 to 100 demux to get the state for subsequent logic expressions. One-hot gives me just a single bit that I can use without worrying about decoding.

Yes, state and next_state will have 100 flops each but one-hot decoding is FAST.

To me, it's a non-starter for Verilog because, in my readings, I don't see anyone reference doing something like what I can do in SV or VHDL.

Alas, I was just looking at Verilog for enrichment. I think I'll just stick with what I know - VHDL. Then again, I have a book on SV, maybe I should read it! Maybe skip the Verilog stuff altogether.

BrianHG · « **Reply #8 on:** January 25, 2022, 12:47:27 am »

Quote from: rstofer on January 25, 2022, 12:32:30 am

Then again, I have a book on SV, maybe I should read it! Maybe skip the Verilog stuff altogether.

free_electron · « **Reply #9 on:** January 25, 2022, 04:12:20 am »

Quote from: rstofer on January 25, 2022, 12:32:30 am

I want to avoid the 7 to 100 demux to get the state for subsequent logic expressions.

what makes you believe it will do that ? demuxes are built using lookup tables in the fpga. just like one-hot decoders.

asmi · « **Reply #10 on:** January 25, 2022, 06:45:29 am »

Quote from: BrianHG on January 24, 2022, 10:25:50 pm

In my new Multiwindow VGA controller, I went with a system wide 'typedef struct packed { };' as I had a progression of a pipelined processor, both many in parallel as an an array and to forward along the process of preparing the picture data and applying palettes and transformations where I would not need to maintain a separate train of timed control signals. You can use a typedef as both a input/output port for your interface as well as internally for additional logic as well as arraying your structures when creating an interface IO where you might want to instantiate multiple copies of a piece of code.

Interfaces are better for interconnect because they have a concept of modports (MODule PORT), allowing to specify which wires are available for each port, as well as wire direction (input/output). Think of it as a cable bundle with multiple plugs for individual modules.

nctnico · « **Reply #11 on:** January 26, 2022, 12:24:24 am »

Quote from: rstofer on January 25, 2022, 12:32:30 am

The enum's of SV would probably work as long as I can make the eventual value one-hot instead of integer (or something). I want to avoid the 7 to 100 demux to get the state for subsequent logic expressions. One-hot gives me just a single bit that I can use without worrying about decoding.

I'd let the synthesizer deal what such decissions. Using a large one-hot can easely create a lot of logic to encode the next state as well! Letting the synthesizer deal with the best encoding leaves you the choice to let it optimise for speed or size as well which could be handy.

hamster_nz · « **Reply #12 on:** January 26, 2022, 03:22:28 am »

Quote from: rstofer on January 25, 2022, 12:32:30 am

The enum's of SV would probably work as long as I can make the eventual value one-hot instead of integer (or something). I want to avoid the 7 to 100 demux to get the state for subsequent logic expressions. One-hot gives me just a single bit that I can use without worrying about decoding.

But unless your FSM is very sequential you pay for it on the other side, with the logic to work out which FSM state bit need to be set next, and to possibly detect and recover from the "zero or more than one state bit is set" lockup.

rstofer · « **Reply #13 on:** January 26, 2022, 04:11:18 am »

Quote from: hamster_nz on January 26, 2022, 03:22:28 am

Quote from: rstofer on January 25, 2022, 12:32:30 am
The enum's of SV would probably work as long as I can make the eventual value one-hot instead of integer (or something). I want to avoid the 7 to 100 demux to get the state for subsequent logic expressions. One-hot gives me just a single bit that I can use without worrying about decoding.

But unless your FSM is very sequential you pay for it on the other side, with the logic to work out which FSM state bit need to be set next, and to possibly detect and recover from the "zero or more than one state bit is set" lockup.

Unless I have missed them, this is your first post in a while. I was wondering if everything was ok in NZ... Guess so!

My state names are assigned one-hot values by the synthesizer. When I set next_state <= <some state>, it is a constant and the synthesizer knows the value. I never mess with the 100 bit vectors. Can you imagine initializing that vector for each state? Even in hex, it would be a PITA.

rstofer · « **Reply #14 on:** January 26, 2022, 04:37:03 am »

Quote from: nctnico on January 26, 2022, 12:24:24 am

Quote from: rstofer on January 25, 2022, 12:32:30 am
The enum's of SV would probably work as long as I can make the eventual value one-hot instead of integer (or something). I want to avoid the 7 to 100 demux to get the state for subsequent logic expressions. One-hot gives me just a single bit that I can use without worrying about decoding.
I'd let the synthesizer deal what such decissions. Using a large one-hot can easely create a lot of logic to encode the next state as well! Letting the synthesizer deal with the best encoding leaves you the choice to let it optimise for speed or size as well which could be handy.

This setting was always an option (buried in the Preferences, I believe) in ISE. I haven't looked for it in Vivado.

I don't understand the encode issue. The state name has a constant value assigned by the synthesizer. In terms of size, yes, I have two vectors using 100 flops each. But I don't have to decode or encode anything. The state bit is set when I enter the state so any logic that occurs in that block of code is enabled by a single bit from the state number. It isn't small, I could do the same thing with just 7 flops per vector but I would ultimately have to create a 7-in 100-out decoder. If I didn't do it, the synthesizer would because one way or another, logic within a state is conditioned by knowing what state we're in.

I am only considering the two-process incantation of a FSM. I haven't thought through the one-process or three-process incantations.

When I was using ISE, there was a dialog about allowing the synthesizer to decide which method to use for state encoding. Then it gave some number of manual options. In that dialog, I thought it talked about one-hot for larger FSMs. Those toy FSMs in every text with about 4 states encoded into 2 bits isn't much of a FSM. Of course they will create some kind of decoder, it only takes a LUT. At some point, that method falls flat.

In any event, I don't specify one-hot in Vivado. I am just assuming that it will generate that style of encoding for the large number of states.

OTOH, there is a lot of discussion about "Safe State" and whether to set the option on the state registers. If set, Vivado generates logic to verify that the state value is legal - it has just a single '1'. I guess my project doesn't need to fly so I'm not going to worry about cosmic interference but it is a point against using one-hot. Not that binary state codes are inherently safer, they are just denser but a wrong state is a wrong state regardless of how it happens.

Berni · « **Reply #15 on:** January 26, 2022, 06:24:52 am »

One way is to instead remember the bit index rather than a enum value:
https://www.verilogpro.com/systemverilog-one-hot-state-machine/

This also has a sideeffect of forcing the compiler into handling it as individual bits rather than a value with a lot of bits (along with the large decode logic that comes with it)

Other times you can actually politely ask the compiler (in tool specific ways) to make the machine onehot. Most compilers actually recognize state machine structures in code so that it can take extra care in implementing them. So they are often also able to turn any state machine into a onehot machine (and might even decide to do that on its own for small ones).

In any case one should always check the compilers final output (most tools can draw it as a block diagram) if you want it to do something in a certain way, as a lot of the time the compiler gets a different idea.

hamster_nz · « **Reply #16 on:** January 26, 2022, 06:29:21 am »

You can use vendor-specific attributes to get the result you want:

Code: [Select]

(* fsm_encoding = "one_hot" *) reg [7:0] my_state;

(From https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug901-vivado-synthesis.pdf)

Of course the optimal solution depends on your use-case. For speed, on-hot is usually king, but for resource usage a more compact encoding is usually optimal.

One of the more cunning way I've seen using a block RAM, with the address being the next state, and the output being the control signals. You can have 256 states and 72 output signals from a single block RAM, and it runs at the speed of a block RAM. Chain two bocks together and you get 144 outputs, and can run at 300+ MHz on low-end parts.

rstofer · « **Reply #17 on:** January 26, 2022, 03:26:56 pm »

Quote from: hamster_nz on January 26, 2022, 06:29:21 am

One of the more cunning way I've seen using a block RAM, with the address being the next state, and the output being the control signals. You can have 256 states and 72 output signals from a single block RAM, and it runs at the speed of a block RAM. Chain two bocks together and you get 144 outputs, and can run at 300+ MHz on low-end parts.

That sounds a lot like microcode - a technique I really like. I simply must do a project using that technique.

Way back in '74 or so, I built a microcoded version of the IBM 1130 computer and used blown-link PROMs to store the microcode. Unfortunately, static RAM hadn't quite filtered down to the hobby level and I scrapped the project when I got my Altair 8800. My FPGA version of the 1130 is a treat to use. I started coding FORTRAN on that machine back in '70 and now, 50+ years later, I still do it from time to time.

The AMD 2901 series bit-slice devices were really nice (for the time). They were always microcoded.

https://en.wikipedia.org/wiki/AMD_Am2900

With the modern HDLs, I fear we have lost interest in microcoding. Oddly, the LC3 project, if built from the book, is microcoded and Appendix C includes a form to fill out. The last page here:

https://people.cs.georgetown.edu/~squier/Teaching/HardwareFundamentals/LC3-trunk/docs/LC3-uArch-PPappendC.pdf

The book:
https://www.amazon.com/Introduction-Computing-Systems-Gates-Beyond/dp/0072467509

Does everybody remember that IBM invented the 8" floppy to load microcode into the System 360 at boot? Yes, end users could change the microcode but it wasn't recommended by IBM - or supported, I imagine.

First the 8" floppy, now the microSD device. We've come a long way in 50 years. 250 kB to 256 GB (or more).

Berni · « **Reply #18 on:** January 26, 2022, 05:22:04 pm »

I did resort to sort of microcode in FPGAs before.

One example is driving a MPU bus display from a FPGA. These things need a bunch of initialization to start them up and then have commands issued to draw pixels into the internal frame buffer. The state machine to do all that would be pretty hefty, so i instead created a small state machine that reads commands from a 'ROM' (made with RAM blocks obviously). At the start it would start executing 'instructions' from the beginning of ROM until reaching a end instruction, then on the beginning of each frame it would jump to the middle of ROM and execute a sequence that places the displays cursor to the start and makes it ready to take data followed by the instruction to make the state machine pump pixel RGB values into it. Tho i suppose this is less microcode and more like a tiny very dumb single purpose computer that executes a program. In any case the result was a much smaller state machine that uses a lot less LUTs to get the job done.

I tend to do similar things when a peripheral (over SPI or similar) needs initialization, just play back the data out of a memory block.

free_electron · « **Reply #19 on:** January 26, 2022, 06:30:02 pm »

Quote from: Berni on January 26, 2022, 05:22:04 pm

I did resort to sort of microcode in FPGAs before.

i used to make accelerators that way.
on an ARM 7 processor you can set the bus timing. i set the read timing all the way to the end of the cycle. the fpga has a clock generator that makes two clocks and a multiphase enable signal. One clock feeds the arm processor, the other clock feeds the fpga logic and the multiphase ( no overlapping ) enables. i could create up to 8 time slots in one arm bus cycle ( between start of cycle and the point where the arm actually 'latches' the data. for a write cycle the arm would latch in the beginning of the cycle. this gives met 7 more time slots to perform something in the fpga. for a read the arm would read at the end of the cycle , giving me another 7.

If the arm has to write something to the fpga to be 'executed' and read back the result i have 14 time slots to perform what i needed to do. No waitstates needed.
for example the ARM writes 'read adc channel 7' to the fpga. The fpga would set the channel mux , start the a/d , grab the data (serially) , scale it , sign adjust it , add an offset and put the data on the bus. by the time the processor does the read all that stuff has happened.
the transport to the a/d or peripheral ran at high clockspeed so i could turn around the data within the available time slots.

I had a programmable transport generator.
for example : read a byte , alter bits 3 to 5 with new payload and send out.

The 32 bit operation was encoded as : 8 bits for 'instruction', 3 argument bytes ( all in one 32 bit word )

Lets assume for a second that you can an adc control register with the following map :

[busy][channel (2..0)][gain (2:0)] that resides in register 0x14 of the adc

The following 'instruction' word would access 'channel(2:0)' :
0x22 , 0x14,0b00111000,0x02 : FPGA instruction 0x22 : read address 0x14 of ADC, alter bits 5 to 3 ( indicated by the 00111000) to '02' and write it back

So i would define a 'constant' 0x22143800 that encodes the 'FPGA instruction'.

To perform the operation on the adc all i needed to do is write the memory location

`define Set_ADC_channel = 0x22143800

in the arm code : (fpga_execute is a memory location)
FPGA_execute = Set_ADC_Channel + 02; ' the arm would write 0x22143802 to the FPGA 'instruction' register.
the fpga decodes the instruction : 0x22 means it needs to execute the following microcode :
- read memory location (0x14) from the adc
- alter bits 5:3 to 0x02
- write back to adc
- take reading from adc

so the next ARM instruction would be :
ADC_result = FPGA_readback;

no need to cache anything in the fpga or arm , no need for the arm to waste time doing bit-twiddling , shift operations, sign expansions or any other 'data massaging'. all the complicated 'transport' related stuff was handled by the FPGA as emulated operations.
All the bit fields that resided somewhere in registers , over a serial transport, were 'virtualized' as single arm operations. some transport was over spi , other over i2c. others parallel. the arm didn't care. it just said 'i need to modify these bits in that register to this.' and it could do that in a single memory access. the fpga did all the work.

combine that with true dualport mailboxes and you could create a system that never needed any printf or scanf. the 'host' would write something in the mailbox , the fpg would set things up , fire the interrupt tot eh arm , the arm picks up his 'instruction' ( from the host program on the pc) and arguments , and does his thing. some the those things are in turn accelerated by the fpga. the results are written back to the mailbox memory that clears the wait flag so the host could read the answer. the host communicated bulk usb packets with fixed frame length. one out, one in. Everything was timed to zero waitstates.

the host could do

for channel 0 to 7
print read_adc_channel(channel)
next

nctnico · « **Reply #20 on:** January 26, 2022, 08:47:08 pm »

Quote from: rstofer on January 26, 2022, 04:37:03 am

Quote from: nctnico on January 26, 2022, 12:24:24 am
Quote from: rstofer on January 25, 2022, 12:32:30 am
The enum's of SV would probably work as long as I can make the eventual value one-hot instead of integer (or something). I want to avoid the 7 to 100 demux to get the state for subsequent logic expressions. One-hot gives me just a single bit that I can use without worrying about decoding.
I'd let the synthesizer deal what such decissions. Using a large one-hot can easely create a lot of logic to encode the next state as well! Letting the synthesizer deal with the best encoding leaves you the choice to let it optimise for speed or size as well which could be handy.

This setting was always an option (buried in the Preferences, I believe) in ISE. I haven't looked for it in Vivado.

I don't understand the encode issue.

Just ask yourself what the logic looks like to produce the next state. If you have cross references between states (from state 1 to state 3 and from state 2 to state 3 for example) then the logic that determines that the bit for state 3 should become '1' depends on the bits state 1 and state 2 AND the signals that influence going to state 3. In the end you are ending up with the same amount of logic! You are basically trying to push a balloon in a suitcase. What you push in from one side, comes out at the other side.

As Berni stated: if speed and / or size are of concern then a programmable statemachine is a much better option. You get the re-use a lot of the logic while allowing complicated state changes. The Picoblaze (from Xilinx) comes to mind.

rstofer · « **Reply #21 on:** January 26, 2022, 11:04:18 pm »

I have no idea how next_state is generated. Next time I play with Vivado, I'll come up with a test program with some non-trivial number of states. I suppose I should look at 1, 2 & 3 process FSMs with binary, gray and one-hot encoding. I can look at timing and resources.

hamster_nz · « **Reply #22 on:** January 26, 2022, 11:34:43 pm »

Quote from: rstofer on January 26, 2022, 11:04:18 pm

I have no idea how next_state is generated. Next time I play with Vivado, I'll come up with a test program with some non-trivial number of states. I suppose I should look at 1, 2 & 3 process FSMs with binary, gray and one-hot encoding. I can look at timing and resources.

I was trying to think when Gray coding for an FSM would be of use... fully async designs? when the state is used in a different clock domain?

Anybody got any use-cases that make it seem the representation of choice?

Someone · « **Reply #23 on:** January 27, 2022, 12:11:03 am »

Quote from: hamster_nz on January 26, 2022, 11:34:43 pm

Quote from: rstofer on January 26, 2022, 11:04:18 pm
I have no idea how next_state is generated. Next time I play with Vivado, I'll come up with a test program with some non-trivial number of states. I suppose I should look at 1, 2 & 3 process FSMs with binary, gray and one-hot encoding. I can look at timing and resources.

I was trying to think when Gray coding for an FSM would be of use... fully async designs? when the state is used in a different clock domain?

Anybody got any use-cases that make it seem the representation of choice?

Gray being a special case of binary coding, except the mapping from states to encoding is arbitrary anyway.... so there should be no difference between them other than the optimization goal where known (fixed or more probable) sequences could be preferentially encoded:
https://en.wikipedia.org/wiki/State_encoding_for_low_power
That article mentions the need for a different trade-off depending on the lut/register ratio, so back to implementation specific.

SiliconWizard · « **Reply #24 on:** January 27, 2022, 01:06:44 am »

Quote from: hamster_nz on January 26, 2022, 11:34:43 pm

Quote from: rstofer on January 26, 2022, 11:04:18 pm
I have no idea how next_state is generated. Next time I play with Vivado, I'll come up with a test program with some non-trivial number of states. I suppose I should look at 1, 2 & 3 process FSMs with binary, gray and one-hot encoding. I can look at timing and resources.

I was trying to think when Gray coding for an FSM would be of use... fully async designs? when the state is used in a different clock domain?

Anybody got any use-cases that make it seem the representation of choice?

As you know, Gray coding has the property of having consecutives values changing only by 1 bit. So first remark is: for a FSM, it will make a difference only if you only cycle through states always in the same order (which can allow you to use consecutive Gray codes for states), otherwise it won't make much of a difference. But if you can ensure the order of states is fixed for a given FSM, the fact only 1 bit is flipping at each transition can both limit power consumption (now on a typical FPGA, it's probably not going to matter much, but for an ASIC, it could) and avoid glitches. Glitches are usually not going to be a problem if your FSM is synchronous (and timing requirements are met), but for asynchronous FSMs, that could be an issue. That is probably not going to matter on FPGAs, because pure asynchronous FSMs are probably a rare beast, but again for general digital design (on ASIC for instance), there could be some use cases.

One-hot encoding will change 2 bits from one state to the next, but it will be consistent for any state change, whatever the order.

All in all, on FPGAs, I've never seen a need for gray encoding for FSMs, and I've almost only ever used one-hot encoding. This is also often the default for FPGA synthesis tools.
But if you're not targetting FPGAs, then I guess it could be needed. Just my 2 cents - do not hesitate to show how Gray coding could be needed/make a real difference on FPGAs, and in which contexts.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Verilog automatic one-hot encoding for state names (Read 7097 times)

Share me