ATmega32u4 is one of my favourite microcontrollers, and has a native low-speed/full-speed (1 or 12 Mbit/s, respectively) hardware USB interface.
All your users need for flashing a new firmware, is a normal USB application.
The Arduino environment uses the
AVR109 protocol for uploading firmware to '32u4-based microcontrollers. (The
AVR910 protocol with hardware SPI pins and another microcontroller or USB-SPI bridge is used when you wish to reprogram the microcontroller completely, including the bootloader.)
You do not need the entire Arduino environment, as it uses avrdude for flashing. You can just implement the protocol yourself. (Meaning, your own standalone single-file firmware updater would be a normal application, that flashes the Arduino .hex file to your microcontroller.)
The protocol uses USB serial, and is easy to implement. The "trick" is that depending on the bootloader, the '32u4 will only respond to the protocol for a short while after a reset, or if a specific pin is in a specific state after a boot. If your device implements USB Serial, you can use a baud rate change to switch to firmware update mode (reset, followed by USB enumeration, exporting an USB Serial device used for the AVR109 protocol, like Arduino Leonardo boot loaders); if your device implements USB HID, you can use a custom HID command (from host to device) for the same purpose. If you write your own bootloader, like PJRC did (proprietary HalfKay, for Teensy 2.0 using ATmega32u4), you can even use HID for the bootloading (albeit a bit slower than USB Serial, just a few kbytes per second).
The bootloader used on Arduino Leonardo and Pro Micro clones is
Caterina, which is based on Dean Cameras LUFA library. You might to take a look at
LUFA; there might already be tools you can use there.
On my Linux machine, I use a wrapper around avrdude when flashing my Pro Micro clones (ATmega32u4, same pinout as SparkFun Pro Micro, uses Arduino Leonardo bootloader) which uses the baud rate change (to 1200 baud) to auto-reset the clone, and waits until the USB Serial endpoint gets enumerated, then runs proper avrdude with the corrected parameters (reflecting the re-enumerated USB Serial end point). That way I only need to reset the clone (shorting RESET pin to GND) if my firmware does not use USB Serial (as then there is no USB Serial device to change baud rate for).
The thing that is a bit annoying here, is that you really need to write and test the application on all OSes you intend to support.
For Linux, you'll need to compile a version for each hardware architecture (x86, x86-64/AMD64, some ARM variants, perhaps 32-bit MIPS), and link it statically so the same binary will work across all distributions.
Of course, if you want your device to be used with different OSes, you do need to test it on them also, so having the necessary hardware and perhaps virtual machines to test different versions of the OSes, is a good idea.