Mind that when you do something like,
low = (*big_p) & 0xff;
high = (*big_p) >> 8;
big_p += sizeof(array[0]);
assuming sizeof(array[0]) is 2 (so you aren't skipping bytes), the compiler will emit code to access the first byte, then the second byte, then increment the pointer. In subsequent optimization steps (assuming more than -O0), it should realize that these steps are redundant (e.g., loading a 16-bit value into a pair of 8-bit registers, saving 'low'; loading them over again, shifting them over by a whole register width, then saving 'high'..), and optimize it down to sensible code. In particular, for something like AVR, it has to increment the pointer and load memory, which should reduce to a pair of lds reg, [reg+] instructions. Which includes the increment right there, so it's done.
This is why it's always a good idea to inspect the assembled output, to make sure you're doing what you think you're doing.

Instead of explicitly incrementing the pointer, indexing the array in a loop should reduce to very similar results.
Such code is always portable to other platforms, at some cost to memory or efficiency. For example, a 32-bit machine will have an int array with a natural width of 4 bytes, but as long as you're using the same portable logic, you're only writing and reading the lower 16. Memory is wasted, but access is not inefficient, in that it stays word-aligned. (You usually pay a price for misaligned access: x86 memory fetch requires an entire bus cycle whether you're reading a single byte or the full bus; ARM doesn't allow it at all and so the compiler must generate fetch, mask and bitshift instructions. In both cases, the faster code will load a word into a register, and chop it up from there.) Likewise for big-endian systems, the pointer might have to stutter-step (e.g., 6809), costing execution time and code space, but remaining correct.
There can be no such thing as truly portable code; you can optimize your code structure to suit some platforms, and you can use #defines to target multiple platforms, but only as many as you have tested (and have driver support for -- in an embedded context, this is the probably the bigger challenge).
Tim