I tried a lot of different stuff. Here's some of the things I did:
Using the Olimex AVR ISP mkII:
I cut my app after the first 8k, erased the chip and uploaded that, then read it back and used vbindiff (a binary diff tool) to check the files, and they are 100% equal in the first 8k, but the data is repeated every 8k until 0x20000 is reached, the rest is filled with 0xFFs.
I erased the chip again and uploaded the whole app (~10k, CDC example from ASF compiled with newest avr-libc) and read it back, vbindiff again as before. After a while I realized that what I see is the binary AND of the first 8k and the remaining data of my app with length 8k repeated every 8k.
I tried some other stuff which all ended up with the same result described above.
I tried writing to 0x820000 instead of 0x800000 and got the same result, only that that the repeated area was from 0x20000 to 0x40000 in 0x2000 steps again. So exactly the same as before just with a 0x20000 offset.
I tried programming the "boot" area at 0x840000: apps with < 8k worked with 0x0-0x400000 being empty (0xFF), apps with > 8k gave the expected "app too large" error.
Finally to check whether the programmer's at fault, I modified the xBoot bootloader for xmegas and made it program a 256 byte buffer filled with 0x00, 0x01, ..., 0xfe, 0xff at various offsets, read them back and activated different LEDs depending on the result. I found out that the behaviour is exactly as before: writing at 0x2000 overwrote 0x0000, writing at 0x2001 overwrote 0x0001 and so on.
Also tried reading back that result with avrdude via PDI to get the results confirmed again.
There's only two things that don't exactly add up:
a) when using avrdude, a page erase > 0x2000 did not erase the page at 0x0000 as described above, but the page erase when using xBoot did erase at 0x0000 when trying to erase 0x2000
b) when I played around with writing at 0x0000, then 0x2000 and then 0x4000 over xBoot, I did not always get the binary AND of all three bytes, for example I tried writing
0b11111111 at 0x0000
0b11110000 at 0x2000 and
0b00111100 at 0x4000 and got
0b00000000 as result, instead of the expected 0b00110000, but I'm not sure if I maybe made some mistake, because later tests worked and it was late at night already, so I had no more patience to properly test this.
Anyways, thanks for reading this WoT.
There's only few things that could still be the culprit: I thought I read something about the NVM shuffling areas in flash to increase lifetime, but I can't find it anywhere in the docs anymore; and also there's the apptable which is 8k as well and can be protected, maybe it's doing something fishy. Currently still reading through the manual and datasheet line for line.