I'll preface this with the caveat I have no idea what you're trying to debug rather than issues on the design, and also you asked for simple so this might be a bit more in the realm of "complicated but extremely useful given the right scenario".
For the purpose at hand I did find the bug that prompted me to start this thread.
What I was doing is nothing too complicated FPGA wise, a multi-channel GPS-controlled camera trigger controller listening on an I2C bus. An MCU could kind of do it but it would be a bitch to get the timings tight to less than 100 ns. Because it's proprietary I couldn't just drop a GPL I2C core in, had to write it myself and the NXP I2C spec is not ultra clear (i.e. exactly when does the master release SDA so that the slave may ACK it ?)
Here's my typical steps for writing something.
1. Lint design as I write it (I like Verillator's linting tool a lot)
Check, I compile with yosys and iverilog with -Wall ; but I couldn't get Verilator to like the SiliconBlue cell library simulation models.
2. Testbench for each individual module you have within the design (and the associated regression suite, something simple like Make with some grepping works fine)
Almost check, I have benches for complex modules but not for simple ones where I don't suspect bugs (e.g. clock divider or reset controller.)
3. Testbench for top level of design also added to regression.
Haven't been doing that but I think I should put the effort into it.
4. Check timing analysis as you go (Xilinx has this, Lattice must have it in their real tool, I'm not sure if the opensource tools support this at all)
icetime gives maximum frequencies and critical path delays but with generated symbol names, it's kind of unreadable and looks like this
2.793 ns net_26661 ($abc$52038$techmap\S_GRN.$3\q[31:0][14]_new_inv_)
odrv_7_18_26661_26803 (Odrv4) I -> O: 0.372 ns
t7449 (Span4Mux_v4) I -> O: 0.372 ns
t7448 (LocalMux) I -> O: 0.330 ns
inmux_9_17_38161_38188 (InMux) I -> O: 0.260 ns
lc40_9_17_3 (LogicCell40) in0 -> lcout: 0.449 ns
5. Gate level sim with generated .sdf file(Depending on the scope of your project doing GLS on the entire thing might be a bit much)
So with laugensalm's tip I can run LUT-level sims with iverilog but I'm not sure it really adds anything to a top-level Verilog simulation. Will have to check to see if the SiliconBlue cell libs have timing information and how well iverilog makes use of them.
With all of those steps done when you put the design on the part it should work assuming your testbenches properly tested the behavior. If your design doesn't work and you truly can't sim it then there are a few options depending on the complexity of the design.
Like I was saying the main issue is when your FPGA has to talk to the outside world (and usually has to, otherwise why are you using an FPGA?) and then what? You have to model all the external things in Verilog too if you can. Some people said "co-simulation" but I'm not sure I understand the concept.
1. Simple Designs - UART or Blinky or Pins and logic analyzer for debug as you mentioned
Done
2. Complicated Bus Based Design - Internal Bus Scope and UART.
UART done, for the next time I think I'll define a JTAG-like daisy chained debug bus with a fixed number of signals and some preprocessor macros to make it easier. If needed I can modify the Yosys code to have it output what's necessary.
Elaborating on 2, most designs make use of some internal bus like AXI or APB or Wishbone, etc. Designs this complex usually have issues within the communications of different masters and slave devices. The Bus Scope is effectively an internal logic analyzer storing every transaction that occurred on the bus for N cycles. The caveat is that it uses resources on your chip, and the depth is directly related to how much ram you can give up. Your UART will act as a master and allow you to probe the bus scope for its internal data. You can set an internal trigger wire to tell the scope to record. A simple python script then formats that data for you to debug.
Like I was saying there's a nice 512 kiB SRAM chip on the dev board, I could use it to store traces.
Another option is to debug it on an ECP5 which has the JTAG port.
Zip CPU has a nice writeup https://github.com/ZipCPU/wbscope on his Wishbone bus scope here that I enjoyed and it worked when I tried it out.
I'll check. Instantiating a CPU (PicoV32 or something) for debugging/self-test feels a bit excessive but that's more of a gut feeling, rationally it's not necessarily a stupid idea.
Alternatively, I know Xilinx has a Internal scope IP you can drag and drop into your designhttps://www.xilinx.com/products/intellectual-property/chipscope_ila.html, and I've been told Alterra does as well. Maybe Lattice does too (this seems like it: https://www.latticesemi.com/-/media/LatticeSemi/Documents/UserManuals/RZ/Reveal34UserGuide.ashx?document_id=50887)? The Xilinx one even allows you to bind to individual wires rather than just the bus itself.
Again, this might be a bit much but I don't really know what you're trying to debug here, knowing may help a bit
Thanks for the tips, I may end up switching to the proprietary toolchain but I'm slightly allergic to the Windows-style clicky-feely interfaces, I'm more of an Emacs+make in termal guy.
BTW thanks to everyone for the useful information (and guilt trips about not writing full benches
the core works, I'll switch back to the analog side of things.