Electronics > Beginners
Data Structure and Algorithms
<< < (4/5) > >>
Nominal Animal:

--- Quote from: Mechatrommer on June 01, 2019, 11:12:38 am ---when i was young, the 1st edition
--- End quote ---
I have a copy of the second edition of it, Data Structures and Algorithm Analysis in C by Mark Allen Weiss.  It works well as an introduction to different types of trees, disjoint sets, and such abstract data structures, as well as shows how to analyse the behaviour of algorithms (time and space complexity) properly.  If you've written command-line utilities or similar code in C, that book works well as an introduction to the scientific side, to help you develop the knowledge on how to make practical choices.  Or, if you want to know how to analyse and compare algorithms, that book shows how.

I do high-performance computing, specifically molecular dynamic simulations.

For random number generation, I don't trust written publications.  Even Numerical Recipes suggests some really bad linear congruential generators.  (I personally have found I only need Mersenne Twisters and Xorshift generators, although I write my own implementations so that they're trivial to replace if something better or more interesting comes up.)  However, the methods of generating non-uniform distributions are well described in NR and elsewhere, if you happen to need them.  (On the gripping hand, things like generating 3D points with an uniform distribution on a sphere, or inside a ball, is still faster to do using Xorshift and the exclusion method, than direct generation of the proper distribution, because trigonometric functions like sin and cos are slow compared to generating a couple of extra random numbers using Xorshift or Mersenne Twister.  So, don't trust "faster" really means "faster", unless you know in which context, with which unstated presuppositions.)

With cryptographically secure random number generation, the problem is initializing and maintaining state properly.  (This is what once bit Debian's OpenSSL, for example.)  And that is about proper software engineering, and not computer science per se; about secure coding practices, not about algorithms.

Microcontrollers and computers nowadays have one huge difference: caching behaviour.  Computers have multi-level caching, with cache locality being a much bigger (relative) bottleneck than most realize; and I have not seen this covered well in any book.  It gets even more complicated in practice, as cacheline sizes vary, as do the timing behaviour of single-instruction-multiple-data vector extensions (like SSE and AVX on Intel/AMD processors).  On the 32-bit microcontrollers I have (Cortex M0, M4, M4F), the only real difference is between RAM and Flash accesses.  (The 8-bit ones I have, have Harvard architecture: code is executed from Flash, and dedicated instructions/functions/variable attributes are used to access any data in Flash, compared to normal RAM data access.)

For numerical computation, all current architectures use IEEE-754 binary32 ("float") and binary64 ("double") hardware representation floating-point values.  This means that there is a host of "bit hacks" (stuff like radix sorting floating-point values) that are practical and portable in practice, but not usually covered in books or even in-depth material.  (It is therefore quite funny how deep some people go into comparing different O(N log N) sorting algorithms for large N, as if they were not aware that a practical O(N) solution exists; it's just a matter of the breaking point N varying a lot between architectures due to radix sorting requiring some offline storage, making it cache-sensitive.)

My suggestion would be to pick a small subset of problems you're interested in, to approach data structures and algorithms from.  A practical approach, in other words.

My favourite example I probably have described on this forum elsewhere already, is to write a simple POSIX sort command replacement.  It turns out that the most user-friendly version (least real time used for any given data set; i.e., least real-world time the user needs to wait for the command to complete) is the one that reads the input lines into a tree or heap, because storage I/O is slow, and doing most of the sorting while still transferring data from disk means the least real-world time taken, even if the program spends more CPU time than if first reading everything in memory, and then using some nifty optimized sorting algorithm to sort the data.  Yes, "slower" is sometimes "faster".  And yes, it does also mean we really should have a command-line "interactive sorter" and a "noninteractive sorter", depending on which we care more, the CPU resources spent, or the wall clock time taken.  Implementing both read-into-tree/heap and read-then-sort approaches, and then tweaking them to run the fastest for different sizes of randomized source data (and for different definitions of "fastest": CPU time used, and wall clock time elapsed), and understanding why they behave that way on that particular hardware, is very illuminating, albeit time-consuming exercise.
brucehoult:

--- Quote from: Nominal Animal on June 02, 2019, 10:37:04 pm ---For random number generation, I don't trust written publications.  Even Numerical Recipes suggests some really bad linear congruential generators.  (I personally have found I only need Mersenne Twisters and Xorshift generators, although I write my own implementations so that they're trivial to replace if something better or more interesting comes up.)

--- End quote ---

We've learned a lot since then. And also about hashing functions that are not actually cryptographically secure, but are very close to it and also super fast. The whole evolution MurmurHash, CityHash, FarmHash, MetroHash for example.


--- Quote ---However, the methods of generating non-uniform distributions are well described in NR and elsewhere, if you happen to need them.  (On the gripping hand, things like generating 3D points with an uniform distribution on a sphere, or inside a ball, is still faster to do using Xorshift and the exclusion method, than direct generation of the proper distribution

--- End quote ---

Sure, there are tons of things that are much faster to check than to generate directly, and as long as the area/volume is a reasonable proportion of the unit square/cube the time wasted by rejected points is well worth it. So Mote it be.


--- Quote ---Microcontrollers and computers nowadays have one huge difference: caching behaviour.  Computers have multi-level caching, with cache locality being a much bigger (relative) bottleneck than most realize; and I have not seen this covered well in any book.

--- End quote ---

Even fewer people realise that the reason their algorithm is slow is not even memory caching, but TLB. Many modern ARM processors can only jump around randomly between a maximum of 32 4kB memory pages before you start thrashing the TLB.

Even, for example, Intel Skylake can reference only 64 4kB pages from the L1 data TLB -- and it's only 4-way associative, so badly-chosen access patterns can see even five frequently-accessed bytes/words in different pages causing repeated TLB misses. The L2 TLB references 1536 pages (6144 kB). An L1 TLB miss that hits in L2 TLB incurs a 9 cycle delay.

Even if you can't get your frequently-used data into the same cache line, and least get it in the same 4kB page, people!

(megapages/gigapages do alleviate this if the virtual->physical mapping has large contiguous regions, and your OS supports identifying and exploiting this automatically)


--- Quote ---My suggestion would be to pick a small subset of problems you're interested in, to approach data structures and algorithms from.  A practical approach, in other words.

--- End quote ---

My personal bugbear is people still tediously learning (and coding!) AVL and RedBlack trees!

The properties of caches, TLBs, and VM make binary trees basically obsolete, at least if you care about performance on large data sets. It makes much more sense to use what were originally developed as on-disk data structures, such as B-trees. If you don't care about iterating through the data in sorted order then hash tables are even better -- power-of-two sized hash tables, indexed by a hash code generated using one of those new hash functions I mentioned above (or if you have SHA hardware that may be faster).

If you really really feel you need a binary tree somewhere, look up "Scapegoat Tree". It doesn't need any extra data in each node, is far smaller code and simpler to write (even from memory, for example in an exam or competition), and also outperforms AVL and RedBlack by a lot.


--- Quote ---My favourite example I probably have described on this forum elsewhere already, is to write a simple POSIX sort command replacement.  It turns out that the most user-friendly version (least real time used for any given data set; i.e., least real-world time the user needs to wait for the command to complete) is the one that reads the input lines into a tree or heap, because storage I/O is slow, and doing most of the sorting while still transferring data from disk means the least real-world time taken, even if the program spends more CPU time than if first reading everything in memory, and then using some nifty optimized sorting algorithm to sort the data.

--- End quote ---

That makes sense if your RAM is big enough to hold the file being sorted, but heap and tree data structures are not very friendly to Virtual Memory.

Back in 1985 I had a 28 MB data file that I needed to (re)sort fairly frequently, on a Data General MV/10000 with 4 MB RAM (and about 30 users).

I found that reading the data into an array in (virtual) memory and then using quicksort took about two minutes, vs over an hour for the system sort program.

Quicksort is very Virtual Memory friendly, as it has two pointers stepping one item at a time through the array (one backwards), and it examines (and maybe swaps) every item in every cache line or page of memory that it touches. Once a particular partition gets down to below your physical RAM size it will automatically be done entirely in-memory, and very quickly.
Nominal Animal:

--- Quote from: brucehoult on June 02, 2019, 11:56:56 pm ---We've learned a lot since then. And also about hashing functions that are not actually cryptographically secure, but are very close to it and also super fast. The whole evolution MurmurHash, CityHash, FarmHash, MetroHash for example.
--- End quote ---
Yes, exactly; and software engineering-wise, we buttress even cryptographic hash functions by making rare operations slow by repeated hashing. (For example, password hashing should be slow, so that dictionary attacks and similar are too costly, time-wise, in practice.)

I like that approach a lot.  We trust the science, but also apply sensible engineering principles to give us a practical buffer or leeway, in case the research turns out to be faulty somehow.  I would love to see this applied more in real-world programming.


--- Quote from: brucehoult on June 02, 2019, 11:56:56 pm ---Sure, there are tons of things that are much faster to check than to generate directly, and as long as the area/volume is a reasonable proportion of the unit square/cube the time wasted by rejected points is well worth it.
--- End quote ---
I definitely agree. My point was, not many books/lessons I've seen cover this aspect, or cover it with impractical assumptions (like "if trigonometric and logarithmic/exponential functions are as fast as multiplication and division"), usually without even stating those assumptions.  This leads to learners getting a completely wrong intuitive grasp of the subject at hand.  Then, later on, they either get super frustrated when things don't make sense, or worse yet, they spread their incorrect understanding to others.


--- Quote from: brucehoult on June 02, 2019, 11:56:56 pm ---Even fewer people realise that the reason their algorithm is slow is not even memory caching, but TLB.
--- End quote ---
Exactly!  This is excarberated by poor testing ("it works fine on my machine!") on common architectures like x86-64, where the behaviour/limits vary a lot.

Typically, you see someone microbenchmarking an operation, tweaking it to be "perfect" for a specific compiler version and options and hardware, when nothing else is running; and when incorporated into a larger application or system, the slowdown its poor cache behaviour causes to the overall operation, is blamed on somebody else.


--- Quote from: brucehoult on June 02, 2019, 11:56:56 pm ---My personal bugbear is people still tediously learning (and coding!) AVL and RedBlack trees!

--- End quote ---
They have their uses in OS and database implementation, but not much else, in my experience.  Oh, and when learning how to debug buggy code!  (That is, having a buggy binary-search/AVL/Red-Black/Splay tree implementation, and trying to locate the bug(s).)


--- Quote from: brucehoult on June 02, 2019, 11:56:56 pm ---
--- Quote ---My favourite example I probably have described on this forum elsewhere already, is to write a simple POSIX sort command replacement. [...]

--- End quote ---
That makes sense if your RAM is big enough to hold the file being sorted, but heap and tree data structures are not very friendly to Virtual Memory.
--- End quote ---
As it is a simplified POSIX command replacement exercise, having enough RAM (+swap) is a sensible assumption; most sort implementations do that anyway.

However, my point was exactly that even though it is technically inefficient to read the input into a tree or heap, the end result is more human-friendly: less elapsed real world time to get the result, even if more CPU time is used.  It is somewhat counterintuitive, and thus important to understand, IMHO.  (It also teaches one to ask better questions, like when somebody needs something to be fast, you know to ask "do you need it to complete as soon as possible, or to be as efficient (use as little CPU/memory resources) as possible?" Only a PHB says both, BTW.)


--- Quote from: brucehoult on June 02, 2019, 11:56:56 pm ---I found that reading the data into an array in (virtual) memory and then using quicksort took about two minutes, vs over an hour for the system sort program.
--- End quote ---
Similar findings are still very applicable in both embedded computing and distributed high-performance computing.  At the small end, memory is still quite limited (most routers have megabytes, not gigabytes of RAM, unlike phones, desktop computers, and servers); at the large end, latencies make people wait, and data transfer bandwidth is limited compared to the huge, huge amount of data one can process.

I still occasionally encounter related problems in real life (in large, distributed simulations); variants of external sorting (or rather external partitioning, instead of full sorting; the full dataset distributed to a number of machines, and re-distributing it to balance the amount of computation among the machines).  Currently, it is "solved" by transferring all data to a single node, then that single node parceling it back out to the other nodes.  You'd think that is okay, until you factor in the amount of real-world time wasted across entire clusters, time and again, just because that problem was considered not "important enough" to be solved by some programmers; and that at least that one node then needs enough memory to handle the entire dataset.  Most people in my field simply say "InfiniBand is fast enough, don't worry about it", but I'm not willing to take handwaving as evidence, as I've found such handwaving often is a result from a stinking pile of poo instead of knowledge.
coppice:

--- Quote from: brucehoult on June 02, 2019, 11:56:56 pm ---My personal bugbear is people still tediously learning (and coding!) AVL and RedBlack trees!

--- End quote ---
While these algorithms in their bare forms are a lot less useful than they used to be, I think they are an excellent starting point to get the student thinking about the nature of the problems these algorithms address. I don't think you can leap to memory aware algorithms without covering these basics.
brucehoult:

--- Quote from: coppice on June 03, 2019, 03:41:19 pm ---
--- Quote from: brucehoult on June 02, 2019, 11:56:56 pm ---My personal bugbear is people still tediously learning (and coding!) AVL and RedBlack trees!

--- End quote ---
While these algorithms in their bare forms are a lot less useful than they used to be, I think they are an excellent starting point to get the student thinking about the nature of the problems these algorithms address. I don't think you can leap to memory aware algorithms without covering these basics.

--- End quote ---

I think teaching simple binary trees is certainly useful, with noting they can become unbalanced e.g. degenerate to a linear list if elements are inserted in sorted order, but not attempting to balance them -- at least not by the fiddly little rotations and twirls used in AVL or RedBlack. I don't think that stuff benefits anyone.

It *is* useful to know you can re-balance an arbitrary binary tree in O(N) time, by simply converting it to degenerate list form down, say, the right links, and then recursively building a perfect tree by making a (left child) tree from the first N/2 items, take the next item as the root of the tree, and recursively building a (right child) perfect tree from the remaining N/2 items.

(that's what Scapegoat Tree does when it notices the tree is too much out of balance)
Navigation
Message Index
Next page
Previous page
There was an error while thanking
Thanking...

Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod