There's another way to architect an FPGA CPU that is almost never seen in the wild: Build up a library of 74xx(x) blocks. Then 'wire' the blocks together just like you would do it on a wire-wrap board. Build modules for 7400, 7402, 7404, 7408, 7474, 74181, 74182 and all the others that come up. This is a terrific way to use logic chips as the design elements rather than more conventional FSAs and the other constructs. It would be a very interesting design! Elegant and unique!
You can design the 74181 block in behavioral code (specify the inputs, outputs and function) or model it with lower level logic chips from the datasheet block diagram. I might do that in a later pass but behavioral code would be my first choice. But it would be as simple as substituting one file for another and recompiling to change to the more detailed design.
I would still lay out a microcode word and use microcode for the control unit. Among other things, the typical large FSA isn't needed and although we know that, in the end, the FSA will look a lot like a microcoded solution, one big difference is that the FSA will probably use 'one-hot' encoding so an FSA with 100 states will have a 'state' and 'next_state' vector width of 100 bits. We don't want that for our microcode word width so many fields will be encoded like MUX select bits. Three bits select 1 of 8 mutually exclusive values rather than having 8 signals going to AND gates.
In real hardware, it is common to use tri-state bus drivers to send signals from registers to some bus. This doesn't tend to work very well for FPGAs so we have to implement MUXes. Turns out the synthesis tool will do that by magic if you try to define tri-state drivers.
Everything that needs to happen and the sequence of events is contained in microcode.
Not too many people remember but IBM invented the 8" floppy. Why? To load the microcode for the IBM360.
https://en.wikipedia.org/wiki/History_of_the_floppy_diskFor the meta-assembler, and you would probably need one, you define a word width. Then you define 'fields' within the word (or define the word as a collection of fields). Perhaps 3 bits are allocated to the selection input to an 8 input MUX, something like that. Further, you define default values for each field so that each line of microcode doesn't have to define every field, many of which are not involved with the current operation.
Each line of 'code' defines values for the fields at a particular memory location in the microcode. Extra credit for macros. Maybe use the C preprocessor? There's a simple Python program in here, screaming to get out! But, really, it simply MUST be written in Fortran. Characters and strings are such a PITA for Fortran! We had to rewrite IBM's Commercial Subroutine Package for the IBM1130 to get some speed out of business applications, circa 1970, and the main thrust was string manipulation.
In the end, you wind up defining every bit in every word in the microcode store.
Now, the microcode wants to be alterable and not totally compiled in to the VHDL.
Hit up hamster_nz for his code on reading BlockRAM values from a hex file during the Implementation phase rather than way back at Synthesis. I don't recall where I found it but I used it for my LC-3 project.
Extra credit if you use an SD card to hold the microcode and can come up with a simple way to read the file during boot. Heck, IBM did it way back in the late '60s.
ETA: Once you have the FPGA version working, it would be pretty trivial to use the same blocks to design/build real hardware. Think of the FPGA as a high end simulation.