Code reviews only examine a single snapshot of a subset of the codebase.
We experimented a critical failure bug during system integration test from two different teams. One was European, the other was American.
ModuleA was integrated to ModuleB.
ModuleA (EU) used meters, ModuleB (USA) used inches.
The software went into "disable channels and halt", which means "unrecoverable critical failure, mission aborted"
Another problem we had: how can you really test abnormal high critical conditions?
We had this problem on a Mach2. This kind of firmware was DO178B level A, since the airplane speed was 2500 Km/h, with a lot of low level requirements classified "critical".
if the software does something wrong with CBIT (continuous built-in test, to assure the hardware is ok, and also the software is ok), it takes longer to resume (e.g. because of an overflow in the Cordic/BKM unit? it must NOT happen) , but a delay is critical when you are moving a 2500 Km/h because it makes the airplane to go out of its trajectory, which might be critical during a mission.
We have to care about safety, and mission. Both are a must, and they must not fail.
We did a full coverage, but we tested just a subset of the abnormal high critical conditions. In my report I wrote "test is not applicable for these sub cases" (e.g. what happens if the hardware on a module gets damaged during the attack ? how will it propagate its error to the main system ? Like we have planned and tested, or in a new un-predicted way ? Will it recoverable ?)
Which is a weakness, because there are conditions in the firmware which are not fully explored in their behavior after system integration. We just know how a module reacts on test-case's inputs.