when i was young, the 1st edition
I have a copy of the second edition of it,
Data Structures and Algorithm Analysis in C by Mark Allen Weiss. It works well as an introduction to different types of trees, disjoint sets, and such abstract data structures, as well as shows how to analyse the behaviour of algorithms (time and space complexity) properly. If you've written command-line utilities or similar code in C, that book works well as an introduction to the scientific side, to help you develop the knowledge on how to make practical choices. Or, if you want to know how to analyse and compare algorithms, that book shows how.
I do high-performance computing, specifically molecular dynamic simulations.
For random number generation, I don't trust written publications. Even Numerical Recipes suggests some really bad linear congruential generators. (I personally have found I only need Mersenne Twisters and Xorshift generators, although I write my own implementations so that they're trivial to replace if something better or more interesting comes up.) However, the methods of generating non-uniform distributions are well described in NR and elsewhere, if you happen to need them. (On the gripping hand, things like generating 3D points with an uniform distribution on a sphere, or inside a ball, is still faster to do using Xorshift and the exclusion method, than direct generation of the proper distribution, because trigonometric functions like sin and cos are slow compared to generating a couple of extra random numbers using Xorshift or Mersenne Twister. So, don't trust "faster" really means "faster", unless you know in which context, with which unstated presuppositions.)
With cryptographically secure random number generation, the problem is initializing and maintaining state properly. (This is what once bit Debian's OpenSSL, for example.) And that is about proper software engineering, and not computer science per se; about secure coding practices, not about algorithms.
Microcontrollers and computers nowadays have one huge difference: caching behaviour. Computers have multi-level caching, with cache locality being a much bigger (relative) bottleneck than most realize; and I have not seen this covered well in any book. It gets even more complicated in practice, as cacheline sizes vary, as do the timing behaviour of single-instruction-multiple-data vector extensions (like SSE and AVX on Intel/AMD processors). On the 32-bit microcontrollers I have (Cortex M0, M4, M4F), the only real difference is between RAM and Flash accesses. (The 8-bit ones I have, have Harvard architecture: code is executed from Flash, and dedicated instructions/functions/variable attributes are used to access any data in Flash, compared to normal RAM data access.)
For numerical computation, all current architectures use IEEE-754
binary32 ("float") and
binary64 ("double") hardware representation floating-point values. This means that there is a host of "bit hacks" (stuff like radix sorting floating-point values) that are practical
and portable in practice, but not usually covered in books or even in-depth material. (It is therefore quite funny how deep some people go into comparing different
O(N log N) sorting algorithms for large
N, as if they were not aware that a practical
O(N) solution exists; it's just a matter of the breaking point
N varying a lot between architectures due to radix sorting requiring some offline storage, making it cache-sensitive.)
My suggestion would be to pick a small subset of problems you're interested in, to approach data structures and algorithms from. A practical approach, in other words.
My favourite example I probably have described on this forum elsewhere already, is to write a simple POSIX
sort command replacement. It turns out that the most user-friendly version (least real time used for any given data set; i.e., least real-world time the user needs to wait for the command to complete) is the one that reads the input lines into a tree or heap, because storage I/O is slow, and doing most of the sorting while still transferring data from disk means the least real-world time taken, even if the program spends more CPU time than if first reading everything in memory, and then using some nifty optimized sorting algorithm to sort the data. Yes, "slower" is sometimes "faster". And yes, it does also mean we really should have a command-line "interactive sorter" and a "noninteractive sorter", depending on which we care more, the CPU resources spent, or the wall clock time taken. Implementing both read-into-tree/heap and read-then-sort approaches, and then tweaking them to run the fastest for different sizes of randomized source data (and for different definitions of "fastest": CPU time used, and wall clock time elapsed), and understanding why they behave that way on that particular hardware, is very illuminating, albeit time-consuming exercise.