Searched elements are 32 byte structures in a field.If I understood correctly, the entries you compare are small – integers – but not consecutive in memory. If so, your cache utilization may be poor, and some of the approaches you tried will be bounded by memory bandwidth.
The search index is an integer, so that the compare function has as little influence as possible.
Trees are reasonably fast. But, if you keep adding and removing many elements over a long period of time in a complex program, even for balanced trees memory usage may grow way beyond what you expected. Things get worse if the tree is not balanced. The situation is caused by trees — depending on the implementation style — requiring O(lg(n)²) memory or causing address space fragmentation. The latter can be remedied a bit, but at the cost of complicating the implementation. Referential locality will also suffer.
If I understood correctly, the entries you compare are small – integers – but not consecutive in memory. If so, your cache utilization may be poor, and some of the approaches you tried will be bounded by memory bandwidth.
// start time
t0 = PerfCountStamp();
for (k = 0; k < runs; k++)
{
for (n = 0; n < map_size; n++)
{
index = map[n];
ASSERT(index < tbl_size);
// Search function is inlined, "data" is the search data structure (std::map or hash table ...)
// Return value is a pointer with the user data entry.
lp = search_fp(tbl, tbl_size, index, data);
ASSERT(lp);
id = lp->id;
ASSERT(id == index);
// prevents the compiler from optimizing the loop away
sum += id;
}
}
// end time
t1 = PerfCountStamp();
td = t1 - t0;
sum += td;
Agree with above. Cache is most often the determinant on modern systems.
Micro-benchmarking, as you said, is often pointless. It just gives us a micro-reality. :)
IMO, what should matter is the usual: algorithm complexity for large N, and ease of implementation/cache considerations/particular tradeoffs for a given application, for smaller N.What matters is the resulting speed. Normally this algorithms are implemented in a standard library.
Just for the record - but this is for the record and by no means a general approach - I almost always use binary search by default, unless there's an excellent reason for not doing so, which I've personally found rare. I've more than once considered using hash tables for a particular application, just to end up with binary search, because hash tables are not the perfect tool either, the probability of collisions increases as N gets large and the overhead can sometimes be hard to justify. Hash tables are also not always very good for cache use. So, it depends.
Note that using the 'bsearch()' standard function can be slightly misleading. First there is of course the overhead of it calling a callback 'order' function for each comparison, and also it may actually be implemented in various ways depending on N. I know that qsort() is not necessarily a quicksort for small N, for instance, on some implementations. Could be the same for bsearch().
Cache behavior and branch prediction are two factors that might easily affect the results of such comparisons unless these comparisons are deliberately set up to suppress the effects of such factors. Of course, in real life you are supposed to take advantage of these effects, not suppress them. However, the magnitude of their influence is heavily platform-dependent.