I think there is no problem. In the case you describe (memory must be read, no optimizations allowed), the memory must be qualified volatile, and the standard memmove definition does not have volatile qualifier:
void *memmove(void *dest, const void *src, size_t n);
With GCC, using a volatile pointer when calling memmove would result in, even without -Wall:
t.c: In function ‘main’:
t.c:9:10: warning: passing argument 1 of ‘memmove’ discards ‘volatile’ qualifier from pointer target type [-Wdiscarded-qualifiers]
memmove(dst, src, 123);
^~~
In file included from t.c:2:0:
/usr/include/string.h:46:14: note: expected ‘void *’ but argument is of type ‘volatile char *’
extern void *memmove (void *__dest, const void *__src, size_t __n)
^~~~~~~
t.c:9:15: warning: passing argument 2 of ‘memmove’ discards ‘volatile’ qualifier from pointer target type [-Wdiscarded-qualifiers]
memmove(dst, src, 123);
^~~
In file included from t.c:2:0:
/usr/include/string.h:46:14: note: expected ‘const void *’ but argument is of type ‘volatile char *’
extern void *memmove (void *__dest, const void *__src, size_t __n)
^~~~~~~
so the developer would instantly note their mistake. If they try to cope by not defining the variables volatile, then the problem isn't just within the memmove function; it's actually everywhere, in their own code as well, no read operation is guaranteed.
If you want to be helpful for some particular special case, you could introduce your own memmove_volatile with volatile arguments; then it would make sense you guarantee things like:
1) All src bytes are read
2) All dst bytes are written to,
3) preferably the order of reads/writes,
and document how it works exactly.
But I bet whoever needs such special requirements, writes their own implementation in their own code instead.
I would also guess that the extra test is not worth it.
For discussion sake, if the test saved enough cycles to be measurable, then you'd still be able to optimise it more by avoiding the call & (and parameter setup) all together.
For instance, if the C compiler has inlining capability then
void *memmove(void *dest, const void *src, size_t len) {
if (dest == src || len == 0) {
return dest;
}
return asmmemmove(dst, src, len);
}
might be one way to remove the call with slight impact on output size.
So I wouldn't put the test in the asm anyway.
For anyone who wants to see what black magic compilers can do nowadays, some time ago we made a set of tests with a friend: different, often quite convoluted, implementations of some simple operations. This code below is the only one in the set that has failed to produce optimal output. Coincidently it calls memmove:void swap(unsigned char* v) {
unsigned char tmp;
memmove(&tmp, v, 1);
memmove(v, v + 3, 1);
memmove(v + 3, &tmp, 1);
memmove(&tmp, v + 1, 1);
memmove(v + 1, v + 2, 1);
memmove(v + 2, &tmp, 1);
}
But it wasn’t far from the optimum (2 instructions) and — considering what most people would exect — it does it pretty well. gcc on x86_64 produces: 0: 0f b6 07 movzbl (%rdi),%eax
3: 0f b6 57 03 movzbl 0x3(%rdi),%edx
7: 66 c1 47 01 08 rolw $0x8,0x1(%rdi)
c: 88 17 mov %dl,(%rdi)
e: 88 47 03 mov %al,0x3(%rdi)
11: c3 retq
The optimum is, of course, uing the bswap instruction, avoiding two extra moves.
All implementations of swap, including this one, if used on a contant and being static, are completely eliminated and replaced by constants.
This is why doing optimizations before actually detecting a performance problem is seen a bad practice. Modern tools produce output that may be far from what you expect it to be: to the point of replacing whole portions of code with constants. This is not always true and sometimes doing early optimizations may be beneficial, but this is usually if you already know that you did something 20 times and you know that the compiler generated poor code 20 times for that particular solution.