I just did this recently. I bought a Terasic DE0-Nano board which has an Altera Cyclone IV device on it with about 22K logic elements, some external FLASH and SDRAM and a few other peripherals (accelerometer etc). Its got enough capacity to be able to synthesize the Altera NIOS cpu core + peripherals, or even the Openrisc device from the opencores website (I believe someone even had uCLinux running on it).
I used the board in conjunction with the Altera Quartus software, and they also have a free version of Modelsim that you can use to simulate your design and view the logic states (like a logic analyser display but for simulation).
As for learning, well I'm fortunate that I work with a number of IC designers who reccomended I use Verilog for the RTL code, and everything else I just looked online for examples/tutorials and was able to pick the basics up reasonably quickly, but as an embedded software engineer who uses C mostly, it took a bit of getting my head around some of the constraints!
The dev board is excelent though, its tiny (size of a credit card), runs off the USB cable that you use to program it, and has everything on it that you need to start and most importantly, its cheap - I paid about £69 I think from Farnell in the uk. The only downside I see with it really is that if you then move on to doing your own board with an FPGA on it, you can't use the eval board as a programmer for it (as far as I can tell). But thats not really the end of the world, Altera Byteblaster clones are available on eBay for very little anyway.
I implemented a time to digital converter as a SPI slave device which used up a just a few % of the device's capacity