janoc: Regardless of USB class or transfer type I need to settle on a chip first before I could experiment. (BTW, some mice/drivers do have a 1ms polling rate.)
You are confusing the polling rate that you can request from the host with actual latency you will get. That are two very different things.
If you take the interrupt transfer (e.g. HID), your device gets polled first time - it signals the host that it wants to do transfer. However,
when you are going to actually get the incoming control transfer from the host setting up the actual data transfer containing the HID report, you don't know -
that does not happen right away (e.g. the next polling period)
That is completely dependent on the situation on the OS, how it handles interrupts from the USB host controller, load, task scheduling, etc. Common latencies there are on the order of 10ms or so. The faster polling rate does not help you anything. That's why these "gaming mice" with super fast polling are mostly a gimmick - beyond certain polling rate the host will not be able to service the device as fast as it is asking and the extra interrupts only increase system load.
See why the 125us limit of USB 2.0 is a red herring here? The overhead of the interrupt transfers needed for something like HID is going to blow your latencies out the water. USB 2.0 is good if you need to transfer a LOT of data fast, but that isn't your case.
Let's take the isochronous transfer instead - in that case you have
only one control transfer setting it up and afterwards you are streaming data to the host at a defined speed, no matter what else is going on on the bus. Your transfer even gets priority over stuff like bulk or interrupt transfers, because isochronous mode guarantees constant latency. However, to use this, you cannot insist on HID, you will need to write your own driver and firmware.
Re chips: small USB2.0 micros are rare, I think only some Cypress PSoCs have USB 2 transcievers. USB 2.0 is way too fast and needs way too much memory to service on the small chips (another reason to stick to USB 1.1). You would have to step up to some bigger ARM SoC and similar, but then we are talking devices on the level of a BeagleBoard or something that could power a smartphone.
If you stick to USB 1.1, then you can get the 12MBps full speed transciever in many micros - I have good experience with the STM32F1 & STM32F3xx line, also the LPC17xx from NXP is decent.
PIC was also mentioned, while it does work, the tooling and the libraries from Microchip leave a lot to be desired - it is not very easy to make work, even if you know what you are doing. Moreover, their libraries were not compiling with their new compilers, only the old C18 one
In addition, the smallest USB-enabled chips like 18F14k50 have a weird memory layout with only very tiny slice of the address space accessible to the USB hardware - makes for an "interesting" debugging and development, where you have to count every byte of your HID report in order to make sure it still fits in there.
Some of Atmel's ATmega and XMega chips also have USB hardware, but I don't have experience with those, I have only used the V-USB bitbanged low speed USB on Atmels.