It seems that a counterexample is in order:
Verlet list, also known as
neighbour list.
The practical implementation is a single honking big array. It lists the indexes of close-by particles, for each particle. Each subsequence in the array begins with the number of particles for that particle, followed with the neighbouring particle indexes. The subsequences are in order of particle indexes:
║ Na │ a1 │ a2 │ ... │ aNa ║ Nb │ b1 │ b2 │ ... │ bNb ║ Nc │ c1 │ c2 │ ... │ cNc ║ and so on, the above listing three subsequences in the array. (I'm using 1-based indexing, because you see this most commonly in Fortran code.)
Why not use a linked list (of subsequences), using pointers? Why not have an array per particle?
The answer is simple:
cache locality.
In a molecular dynamics simulation, the darned particles don't usually stay put. Sure, if you have a solid block of some simulated matter, those atoms do stay in place; but simulating such a block is not very interesting. If it melts, you have a big impact, or anything interesting, the atoms move quite a bit. So, unless you do something radical that current simulators do not tend to do, and periodically rearrange the structures describing the particles location, velocity, and acceleration components (used by whatever
predictor-corrector method is used to integrate the particle trajectories) to be sequential in memory, the particles you access will not be sequential in memory.
If you have the neighbour list as a single sequential array, you can process it linearly. The neighbour list itself will have a relatively small cache footprint (your compiler can even produce very accurate prefetch instructions for it), leaving most of the cache for the scattered particle data. One might think that because the particle data is already scattered in memory, it would not be so bad to have the neighbour list also scattered, but they'd be wrong: cache access patterns matter, a lot.
Furthermore, if a single simulation has fewer than 2
32 = 4,294,967,296 particles, you can use 32-bit unsigned integers as particle indexes even on 64-bit architectures, "packing" the neighbour information even tighter, leaving even more of the cache for the scattered particle data.
(In physics simulations using classical potential models, each atom typically has up to a few hundred neighbours, and a simulation up to a few million atoms per CPU core. To get any results, a LOT of time steps are needed, so to get results in reasonable time, the simulations tend to be distributed across a number of CPU cores, each core processing a subvolume of the entire simulation. Chemical simulations using potential models tend to differ, and I'm not very familiar with those. Quantum mechanical simulations model the electron densities as fields, and are limited to a few hundred to a couple of thousand electrons, currently; and are much, much harder to distribute over many physical machines.)
In practice, the difference between neighbour lists where the sublists use pointers (either a linked list, and/or linked from each particle), and where the entire neighbour list is a single linear array, varies depending on how much the particles' data structures in memory reflect their neighbour relationships in the simulation,
but the single linear array is always faster.Does that mean that pointers are not useless? Definitely not, I've already shown some examples of code that you cannot do efficiently with arrays (basically any data structure with members whose type and size varies). Pointers and arrays are just tools provided by your programming language. They have their uses, and your task as a programmer is to choose the appropriate tool (and algorithm!) when solving a particular problem.
If you haven't yet needed to use some tools, it means that either you have only worked on specific types of problems where that tool simply isn't needed to achieve an efficient solution, or that you don't know enough algorithms and approaches to solve the problems at hand. I do like the saying that if all you have is a hammer, all problems start to look like nails; but with programming, the situation is more complex. The problems are abstract, and it is hard to see whether you are using a proverbial "hammer" or a "screwdriver" to drive in a "screw"; you need experience with lots of different types of problems to learn to estimate that.
C is also a funny language in that because it is so low-levelish, programmers are often stuck in microbenchmarking irrelevant details. There is absolutely no sense in comparing array-based and pointer-based linked list traversal,
unless those lists contain real-world data, and the traversal obtains a real-world result. That is because microbenchmarking does not prove anything practical: it is only a preliminary indicator. In practice, our processors (aside from most microcontrollers with built-in SRAM) tend to be bound by memory bandwidth, and cache effects; and memory organization and cache usage depends on the dataset used.
It is not rare to microbenchmark two implementations, say sort algorithms, and find that while the microbenchmark shows that one is faster than the other, in real world applications the opposite is true. It is not always easy to explain why (although "memory and cache effects" are certainly at the core of it). Even then, it is the real world performance that matters. So, please, don't get too hung up on microbenchmarking, because its really is just a preliminary indication, not a proof of anything (except that microbenchmark itself).
Microcontrollers are even funkier, because there is so much variance between different types. The one thing they have in common, tends to be the small amount of RAM used. Therefore, when programming microcontrollers in C (or C++), one is probably more concerned about the total memory footprint -- the amount of memory needed to achieve the desired results --, than speed.