You want 100n decoupling caps near each VDD and then one 4u7 *somewhere*. Just put it next to one of the other caps where you have space on the board. You didn't say whether you're going to use any of the analog features of the chip, but it's a good idea to assume you will someday. The analog reference should probably be tied to VDD (you said this was a simple board), so it'll need to be decoupled too.
To use SWD, you need to bring out SWDIO, SWCLK and nRST (plus VDD and GND). You have to connect your VDD to the debugger's VDD line, but your circuit needs to supply its own power. The debug header on your discovery board has six pins, but you don't really need the last one.
For STM32, you *must* tie the BOOT0 pin to a stable logic level in order to have the chip reset correctly. Just connect it to GND and the mcu will boot from flash.
The chip has RC oscillators that can generate the high speed system clock as well as a lower speed clock for the RTC. They aren't very accurate though, so if you actually care about "real time", you'll need a 32kHz crystal. Load capacitance is important, and ST has an application note somewhere that tells you how to pick a crystal that'll work and what capacitors you'll need to go with it.
If your board is going to be USB powered (or some other non-battery source), treat VBAT as a power pin. Tie it to VDD and decouple to ground. If you want the RTC to run while the rest of the chip is powered down, then you'll have to do a little more work.
Since you're going to the trouble to build a board, why not build in a way to power it too? Add a USB connector and a 3.3V LDO and you're pretty much there.
EDIT: Attached is a what I used in a project last year. The mcu was a STM32F405 in a 64-pin package.