miguel: PSoC looks potentially interesting, but I won't go with one for this. Interesting, but I don't understand what it does really, analog-wise. The few things I looked at, trying to get an overview of what it is, were all marketing. Kinda sad.
rs20: Yes, I need to correlate the timing with the PC, which orchestrates other aspects in the overall process. In theory it can all be done externally to the PC, but it would be more expensive, and especially more complex than would be wise for me to attempt at this stage. Another option, maybe, could be some Linux after tweaking it to prioritize USB.
andersm: That's more or less the idea; start prototyping, and worst case just learn some about what's involved.
George: Not in Europe. But I didn't check the shipping in practice, maybe it is $15.
amyk: I bet in many cases you can still find at least headers for serial or parallel, but not on laptops. But even on serial or parallel I'm not sure what latency or jitter you can expect nowadays. The bus they're on is probably also radically different between computers, depending on the CPU or chipset. Most likely a 386 running DOS would be a better platform.
peter: Worst case I will have gained some knowledge in embedded. Yeah, I generally mean that by "latency" or "jitter". Half duplex is not a problem.
Sandun: 4µs sounds great.
LPC134x's ROM code for HID is interesting. Though I suppose if you go anyway with faster 32-bit MCUs there's no need for dedicated hardware. What I had in mind initially was real hardware that enables good HID performance on cheap/modest MCUs, but by the looks of it it's not necessary even there.
Anyway, thanks for the suggestions everyone!