Aren't such things meant to be debug/test tools, much like asserts, that highlight issues during development but aren't intended to be a cure for anything in production?
What is "a cure for [anything] in production"? I do not believe such a thing exists, just like I do not think security is something you can add on to a design. You design it in in the first place; and if it is sick or needs a cure, you need a redesign. Adding something on top to "cure" the design is a fallacy, and will fail.
In my view, in production, you either prevent a problem, or detect a problem; anything else is useless.
Heuristics like canaries are post-mortem indicators without false positives. They can help pinpoint the mechanism of the problem (because if a canary is dead, showing positively a problem, then it for sure was involved), but as problem indicators they are of dubious utility (because a canary can be dead, but look perfectly alive).
The only reason I would use stack canaries in production, was if I would have access to core dumps from problem cases,
or if I had nothing better available.
Ataradov's objection to hardware stack pointer tracking was that existing techniques like canaries or its extension, pre-filling the stack with a detectable pattern, is trivially implemented and
sufficient for the use cases he can see.
I strongly disagreed, because what I want in production, is to catch and prevent the stack overflow in the first place. (In later messages, after SiliconWizard pointed out both Ataradov and I were conflating buffer under/overruns with stack overflow – the first one being accesses outside of their intended target object, and the second one needing/using more stack than is available –, I pointed out that buffer overflows need support from the compiler, because the current C and C++ rules are such that they currently do not complain about obvious "this is very possibly outside the target object, and therefore a possible bug" code patterns that stick in my eye like a dirty thumb.)
Seeing that you seem to share how Ataradov sees the situation, forces me to think hard why that is.
To me, the situation
in production is simple: A) Heuristics are useless for post-mortems because we don't get core dumps. B) Heuristics can detect
some problem situations, but not all; and the likelihood of a problem being detected is related to the computational effort spent. This means that do make a robust
thing, we must always balance computational efficiency and the statistical likelihood of detecting problems immediately when they occur: we cannot have both.
And that is the issue I have. I do want both. I
need both efficient code, but also detect all problems that are possible to detect, reliably, when they do occur.
To anyone designing a commercial product, that is just one of the issues being balanced. There, not having both is a practical limitation, and accepting it as such, and moving on to solving other problems, is perfectly acceptable,
even clearly preferable attitude compared to mine. If I was a CTO or product line head honcho, I would
always hire Ataradov over myself. And that is saying a lot, if you have any sort of a clue of how capable a developer I believe myself to be.
My own primary motivation on anything related to this stuff is to make sure that the stuff we have in the future is not just "better", but
more robust than the shit we have now.
I know, for a fact, that robustness is something
undesirable from business point of view; planned obsolesence is not malice, but simply an obviously working business strategy, one of the few ones that you can mathematically show from basic principles will work.
So, when you see me rail against long-term development efforts being directed using business rules, I am railing against choosing to race to the bottom-quality, maximum-profit products. I see business (or more precisely, market competition) as one absolutely required part of a functioning society; but it too must balance/compete against humanitys own long term interests. Thus, I do not object to business at all, just to using business as the yard stick here.
Elsewhere, I have described my own long term efforts, currently focusing at a replacement base library for systems programming in C (and as a variant, the base subset of functionality needed for the C/C++ freestanding environment used to develop low-resource embedded targets like microcontrollers). It is slow going, exactly because it is not a business proposal but a research project. It will not stop a developer using pointer expressions that can scribble over unrelated memory, because the C and C++ we have right now just does not even detect many of such patterns, and I do not expect the compiler developers to be interested in helping with that either; but if my shenanigans actually work, then the replacement base library might just induce developers to use patterns that avoid those problematic cases completely.
One surprisingly hard problem I'm chewing at is how to show that passing more information, often theoretically unneeded information, on the object being accessed is worth the extra "cost".
(This ought to be interesting to even beginners at C.)
Consider languages like Fortran and Python that support array slicing. That is, instead of just passing a pointer to a consecutive sequence of elements, they can actually pass a subset or
slice from an array. The way it is used in high-performace computing in Fortran shows that while it does have overhead (more parameters passed per function call) shows that it is worth it, because code of average complexity doing this stuff tends to be more efficient if written in Fortran than when written in C.
At some point over a decade ago, I investigated this at the machine code level, read some computer science papers about efficient operations on 2D matrices, and discovered that passing the full description of how the matrix data is to be accessed, makes average complexity code more efficient and robust.
Here are the two structures used for double-precision floating-point data:
struct owner {
long refcount;
size_t size; /* In doubles */
double data[];
};
struct matrix {
struct owner *owner;
int rows, cols;
long rowstep, colstep;
double *origin;
};
Given
struct matrix m, the expression used by the underlying code that implements the basic matrix operations to access the matrix element on row
r, column
c, is
(m.origin + c*m.colstep + r*m.rowstep) , which evaluates to a pointer to a double. At runtime,
r >= 0,
c >= 0,
r < m.rows,
c < m.cols. As an optional consistency check,the pointers
m.origin,
(m.origin+(m.rows-1)*m.rowstep),
(m.origin+(m.cols-1)*m.colstep), and
(m.origin+(m.rows-1)*m.rowstep+(m.cols-1)*m.colstep) all must be at or above
m.owner->data and below
m.owner->data+m.owner->data.size.
You might think that that means the code has to do two multiplications per matrix element access, but that's not true in practice. You see, to optimize the efficiency of matrix operations, and to leverage single-instruction-multiple-data instructions available on many architectures, depending on the exact operation, we'll want to rearrange the data anyway. You see, for all but the smallest fixed-size matrices, the access order and data locality determines how long the operation takes. The arithmetic operations themselves (addition, subtraction, and multiplication in particular) are not the bottleneck; the time needed to access the elements is.
(This also makes this annoying to benchmark. If you use a synthetic microbenchmark, you are reserving the entire cache for this operation only. But, in real life code, the operation is only a part, a single step, in some chain of operations; and overusing cache at one step often means another step has to pay a much higher price for memory access. So, optimizing the heck out of one operation, can easily lead to real world code that performs
worse.)
The true strength those structures bring to the programming table, is that now you can have a matrix and its transpose refer to the exact same data. You can even have a vector corresponding to its diagonal elements. Modifying an element in one, modifies the elements in all others, because they refer to the same data in memory. None of them are "secondary" or "views"; each is just as primary as every other matrix referring to the same data. The refcount is optional, and normally records the number of uses of the referred to data, including matrices and temporary uses in elementary calculations. (That is, when a function does say a matrix-matrix multiplication, it starts by incrementing the refcounts of the two owners of the data. After the operation is completed, the refcounts are decremented.)
As an example for single-dimensional arrays, C still does not have a standard function or a really efficient way to repeat a byte pattern at the beginning of an array to the rest of the array –– even though that is exactly what the unused branch of every single memmove() implementation would do!