Do you remember exactly which NOR flash chips you use?
Sorry, I can't give you any specific part numbers. It was a while ago, and all the schematics / BOMs etc were the company's property, not mine, so I don't have any record at all of these details. All I can remember was that they were bog-standard cheap serial Flash chips, 8-pin SOICs if i remember correctly, from a variety of vendors. I doubt that's very helpful, though.

ISSI IS25xxxx devices: "A program operation can alter “1”s into “0”s. The same byte location or page may be programmed more than once, to incrementally change “1”s to “0”s. An erase operation is required to change “0”s to “1”s"
Every single part we tried worked in this way.
Cypress S25FL064L: "For the very best performance, programming should be done in full pages of 256bytes aligned on 256byte boundaries with each Page being programmed only once."
This certainly seems correct - programming regions more than once, or programming sub-page regions, necessarily incurs more overhead. I used the "clear a bit in an already programmed byte" operation for correctness / atomicity, not for performance. You should try to follow this rule for writing bulk data.
One thing that I have heard about (I think) NAND flash, not sure if true or if it applies to NOR flash is that writing '0' to to an already programmed bit causes excessive wear.
This could well be correct. We only used (el-cheapo) NOR flash for this, never (high-density) NAND flash.