Ok to simplify it, your microcontroller choice will be heavily influenced by :
- needing a USBOTG interface, preferably with a free stack that can handle hubs.
- Having a DMA that can shove out data to your display at the rate you wish, e.g. a 320x480 display with a 16 bit interface, well you can treat most screens with controllers like a memory, so you hook it on to the external 16 bit memory bus, and can then complete 95% of the interfacing with a DMA,
Ok so this leaves the rest of the sub requirements, selecting what images to load as a background, and for buttons, you would point the DMA at the start location, load the number of bytes for the horizontal width, then when you get the flag for transfer compete, load the next line, rendering down with almost no processing input on your part,
Now the harder part, you have a mouse, so you want a cursor, there are a few ways to approach it, but it will generally require either:
a full frame buffer that you modify, then render the cursor after it,
or a change buffer where you keep track of where the cursor occludes, then once you move to the next spot re-render that space, record the next occlusion space, then render the cursor.
one costs you far more processing time, and there are other ways to approch it,
I'm far more used to 4-10 button interfaces with an encoder for high distance granular settings, in this case I defined render areas, and then cut my UI's into image chunks, so I only had to DMA render in chunks to say show a button, and for text, i had the processor assemble it in chunks using array rotation, essentially i stored the fonts in 16 bit ints, vertical strips 16 pixels high, and X many wide depending on the font size, just being a mask, so I could assemble words just by dragging in say the first strip, AND it so it was just 1 bit, and if it was a 1, send out the 2 bytes from the text colour property. going down the array of the text until the end, then doubling the mask to the next bit and walking down the strips again, rotating vertical strips in to horizontal colours as the LCD expected.
In that case, that was me chasing as much speed as I could with foreground tasks, while still wanting some font sizes and things like italicization and bolding, leaving only a few letters needing special rules like VA where there gap is actually 0, because they get squished together.