Author Topic: Data Structure and Algorithms  (Read 2682 times)

0 Members and 1 Guest are viewing this topic.

Offline ankitdixitTopic starter

  • Newbie
  • Posts: 3
  • Country: in
Data Structure and Algorithms
« on: May 23, 2019, 05:00:55 am »
Hello all, I am new one in this forum and wants to know about how I can start learning Data Structure and Algorithms. Any online book or any online references?
 

Offline agehall

  • Frequent Contributor
  • **
  • Posts: 389
  • Country: se
Re: Data Structure and Algorithms
« Reply #1 on: May 23, 2019, 06:58:14 am »
Google whatever you want to learn more about. If you really want to learn this stuff well, buy the Donald Knuth books. They are still amongst the best resources to learn about algorithms imho.
 

Offline agehall

  • Frequent Contributor
  • **
  • Posts: 389
  • Country: se
Re: Data Structure and Algorithms
« Reply #2 on: May 23, 2019, 07:15:36 am »
From beginner to give up >:D?

It certainly depends on the ambition level, but those books were the first books I bought on the subject and I found them excellent and while somewhat abstract, reading them a couple of times they made sense and gave me a fairly good understanding of the subject. Even if you don't comprehend 100% of the content (or even 50%), they are still very good IF you really want to learn the subject in depth.

If one is just looking for an overview and a lighter introduction, the DK books are not the way to go ofc. But in that case, simple google searches will probably give pointers to easy-to-understand introductions to different algorithms.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4468
  • Country: nz
Re: Data Structure and Algorithms
« Reply #3 on: May 23, 2019, 07:15:41 am »
This is where I started. (in fact with an older version)

http://www.ethoberon.ethz.ch/WirthPubl/AD.pdf

It uses a language not commonly used today, but if you call yourself a programmer you'll be able to convert the examples into C++ or Python easily enough.

Or you can find a compiler here http://freepages.modula2.org/compi.html
 

Offline jeremy

  • Super Contributor
  • ***
  • Posts: 1079
  • Country: au
Re: Data Structure and Algorithms
« Reply #4 on: May 23, 2019, 11:45:46 am »
I found the Robert Sedgewick books “Algorithms in C” and “Algorithms in C++” to be excellent. I think the Knuth books would be like learning the basics of hiking while trying to summit Everest.
 

Offline mariush

  • Super Contributor
  • ***
  • Posts: 5135
  • Country: ro
  • .
Re: Data Structure and Algorithms
« Reply #5 on: May 23, 2019, 12:20:59 pm »
Hello all, I am new one in this forum and wants to know about how I can start learning Data Structure and Algorithms. Any online book or any online references?

Sent you a PM with information (a ftp account to freely download books, ~ 4 GB of algorithms and programming, ~15 GB of electronics related material)

Anyone else interested can pm me for account info.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9933
  • Country: us
Re: Data Structure and Algorithms
« Reply #6 on: May 24, 2019, 02:25:54 pm »
This isn't 'online', it's a rehash of the way I learned the stuff back around '80.

In my view, "Algorithms + Data Structures = Programs" by Wirth is about as good as it gets.  It is based on Pascal so it's probably necessary to install a Pascal compiler.

The shear elegance of Pascal plus the language details make developing algorithms a joy.  They can later be ported to whatever language is in use.  For a learning language, Pascal is the way to go.

Windows 10 users can install Bash on Ubuntu on Windows and then install FPC, the Free Pascal Compiler.  There are other ways to do it.  The nicest is to run full blown Linux.  Your choice of editors will be much greater.  Even when working at the command line, I prefer a full featured editor like gedit.

This specific version is the one you want.  Later versions use Modula

https://www.amazon.com/Algorithms-Structures-Prentice-Hall-Automatic-Computation/dp/0130224189

When you get to the back of the book, Wirth presents a tiny Pascal compiler in just a few pages of code.  You will see the elegance of Pascal totally exposed.  Sets are a very powerful construct and lacking in most languages.  Yes, you can work around it.  But you shouldn't have to.
« Last Edit: May 24, 2019, 02:28:23 pm by rstofer »
 

Offline codingwithethanol

  • Regular Contributor
  • *
  • Posts: 73
  • Country: us
  • always tilted
Re: Data Structure and Algorithms
« Reply #7 on: May 25, 2019, 01:38:10 am »
Google is your friend

I've been coding for 7 years and I learned everything I know from simply googling anything I needed to learn and checking every resource on the first page. Dont use libraries when you start, because they dont teach you anything. It's better to do your own linked list than to just import a lib. It also teaches you by failure; if your code doesnt work, you learn alot by spending a couple hours figuring out what you did wrong.

Good luck, and stay motivated!


its your boi, dj ethanol
 

Online radiolistener

  • Super Contributor
  • ***
  • Posts: 3993
  • Country: ua
Re: Data Structure and Algorithms
« Reply #8 on: May 25, 2019, 02:51:53 am »
you can start to learn from bubble sort algorithm, next you can find more effective sort algorithms, learn lists and trees, etc
 

Offline 0culus

  • Super Contributor
  • ***
  • Posts: 3032
  • Country: us
  • Electronics, RF, and TEA Hobbyist
Re: Data Structure and Algorithms
« Reply #9 on: May 25, 2019, 03:23:11 am »
CLRS is the standard book. It's a real tome, however, and assumes a certain level of mathematical maturity. Another real gem I have on my shelf is "Data Structures and C Programs" by Christopher J. Van Wyk. It is unfortunately out of print so you'll have to find used. You definitely want to find the newer edition that uses ANSI C examples. But the level of the material is a lot more approachable for the beginner than CLRS as Van Wyk uses real code examples rather than pseudocode and assumes a lot less about the reader's mathematical level.
« Last Edit: May 25, 2019, 03:25:59 am by 0culus »
 

Offline ankitdixitTopic starter

  • Newbie
  • Posts: 3
  • Country: in
Re: Data Structure and Algorithms
« Reply #10 on: May 31, 2019, 07:26:33 am »
Thanks to all give your suggestions. When I was browsing the web to learn about coding algorithm I found one site and find there is a list of data structure programming tutorials recommended by the programming community. Check this https://hackr.io/tutorials/learn-data-structures-algorithms
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4305
  • Country: us
Re: Data Structure and Algorithms
« Reply #11 on: June 01, 2019, 09:35:45 am »
Quote
I found the Robert Sedgewick books “Algorithms in C” and “Algorithms in C++” to be excellent.
I second the recommendation.
The online classes he teaches are also very good...
https://www.coursera.org/learn/algorithms-part1

CLRS is apparently https://en.wikipedia.org/wiki/Introduction_to_Algorithms(in case that's as un-obvious to others as it was to me.)
 
The following users thanked this post: oPossum

Offline Mechatrommer

  • Super Contributor
  • ***
  • Posts: 11713
  • Country: my
  • reassessing directives...
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Online coppice

  • Super Contributor
  • ***
  • Posts: 9384
  • Country: gb
Re: Data Structure and Algorithms
« Reply #13 on: June 01, 2019, 12:16:57 pm »
Algorithm is a pretty broad term. The problem with any book with a title like "Introduction to Algorithms" is it covers algorithms the authors thought relevant, but the majority of people looking for algorithms to deal with their problems won't. You really need to look through the topics list for the book, and see if there is a good selection of material that might have some relevance to you.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4305
  • Country: us
Re: Data Structure and Algorithms
« Reply #14 on: June 02, 2019, 10:32:04 am »
There are a bunch of "standard DS&A concepts" that "everyone should know", if mostly from the point of view of "when can I just blast out obvious code, and when should I go look for a well-written library that does something more complicated?" (these days, probably no one should write their own quicksort function, for example.  But knowing when you might need one, and when it wouldn't be a good idea, is important.
Other things seem to go in and out of style.  Back when I took a DS class (late 01970s), we spent a lot of time on disk file organization, which I haven't seen mentioned much recently.  IIRC, Volume 2 of Knuth is half random numbers (aimed mostly at simulation, whereas nowadays you'd want a more cryptographically oriented approach) and half numeric algorithms that are generally left to hardware these days.  Volume 1 spends a bunch of pages defining a generic assembly language to use for programming examples.
There is a bunch of numeric stuff (PID, FFT, numeric differentiation, integration, and differential equations, computer graphics, cryptography, and so on) that seem to be considered separate topics (not often taught to CS majors?  No one does MATH with computers any more.)
There are a bunch of published algorithms for any branch of CS or engineering (standardized stuff for figuring out a reasonable retransmission timeout for TCP packets, for example.  Which has undergone fascinating evolution since RFC793...)

So it does depend on what you're after.  If you need an A&DS class on your resume, read one of the books and/or take one of the classes that have been mentioned.  Know your stacks, linked lists, assorted sorts and searches.  if you need something more specialized, you may need to search...
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6844
  • Country: fi
    • My home page and email address
Re: Data Structure and Algorithms
« Reply #15 on: June 02, 2019, 10:37:04 pm »
when i was young, the 1st edition
I have a copy of the second edition of it, Data Structures and Algorithm Analysis in C by Mark Allen Weiss.  It works well as an introduction to different types of trees, disjoint sets, and such abstract data structures, as well as shows how to analyse the behaviour of algorithms (time and space complexity) properly.  If you've written command-line utilities or similar code in C, that book works well as an introduction to the scientific side, to help you develop the knowledge on how to make practical choices.  Or, if you want to know how to analyse and compare algorithms, that book shows how.

I do high-performance computing, specifically molecular dynamic simulations.

For random number generation, I don't trust written publications.  Even Numerical Recipes suggests some really bad linear congruential generators.  (I personally have found I only need Mersenne Twisters and Xorshift generators, although I write my own implementations so that they're trivial to replace if something better or more interesting comes up.)  However, the methods of generating non-uniform distributions are well described in NR and elsewhere, if you happen to need them.  (On the gripping hand, things like generating 3D points with an uniform distribution on a sphere, or inside a ball, is still faster to do using Xorshift and the exclusion method, than direct generation of the proper distribution, because trigonometric functions like sin and cos are slow compared to generating a couple of extra random numbers using Xorshift or Mersenne Twister.  So, don't trust "faster" really means "faster", unless you know in which context, with which unstated presuppositions.)

With cryptographically secure random number generation, the problem is initializing and maintaining state properly.  (This is what once bit Debian's OpenSSL, for example.)  And that is about proper software engineering, and not computer science per se; about secure coding practices, not about algorithms.

Microcontrollers and computers nowadays have one huge difference: caching behaviour.  Computers have multi-level caching, with cache locality being a much bigger (relative) bottleneck than most realize; and I have not seen this covered well in any book.  It gets even more complicated in practice, as cacheline sizes vary, as do the timing behaviour of single-instruction-multiple-data vector extensions (like SSE and AVX on Intel/AMD processors).  On the 32-bit microcontrollers I have (Cortex M0, M4, M4F), the only real difference is between RAM and Flash accesses.  (The 8-bit ones I have, have Harvard architecture: code is executed from Flash, and dedicated instructions/functions/variable attributes are used to access any data in Flash, compared to normal RAM data access.)

For numerical computation, all current architectures use IEEE-754 binary32 ("float") and binary64 ("double") hardware representation floating-point values.  This means that there is a host of "bit hacks" (stuff like radix sorting floating-point values) that are practical and portable in practice, but not usually covered in books or even in-depth material.  (It is therefore quite funny how deep some people go into comparing different O(N log N) sorting algorithms for large N, as if they were not aware that a practical O(N) solution exists; it's just a matter of the breaking point N varying a lot between architectures due to radix sorting requiring some offline storage, making it cache-sensitive.)

My suggestion would be to pick a small subset of problems you're interested in, to approach data structures and algorithms from.  A practical approach, in other words.

My favourite example I probably have described on this forum elsewhere already, is to write a simple POSIX sort command replacement.  It turns out that the most user-friendly version (least real time used for any given data set; i.e., least real-world time the user needs to wait for the command to complete) is the one that reads the input lines into a tree or heap, because storage I/O is slow, and doing most of the sorting while still transferring data from disk means the least real-world time taken, even if the program spends more CPU time than if first reading everything in memory, and then using some nifty optimized sorting algorithm to sort the data.  Yes, "slower" is sometimes "faster".  And yes, it does also mean we really should have a command-line "interactive sorter" and a "noninteractive sorter", depending on which we care more, the CPU resources spent, or the wall clock time taken.  Implementing both read-into-tree/heap and read-then-sort approaches, and then tweaking them to run the fastest for different sizes of randomized source data (and for different definitions of "fastest": CPU time used, and wall clock time elapsed), and understanding why they behave that way on that particular hardware, is very illuminating, albeit time-consuming exercise.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4468
  • Country: nz
Re: Data Structure and Algorithms
« Reply #16 on: June 02, 2019, 11:56:56 pm »
For random number generation, I don't trust written publications.  Even Numerical Recipes suggests some really bad linear congruential generators.  (I personally have found I only need Mersenne Twisters and Xorshift generators, although I write my own implementations so that they're trivial to replace if something better or more interesting comes up.)

We've learned a lot since then. And also about hashing functions that are not actually cryptographically secure, but are very close to it and also super fast. The whole evolution MurmurHash, CityHash, FarmHash, MetroHash for example.

Quote
However, the methods of generating non-uniform distributions are well described in NR and elsewhere, if you happen to need them.  (On the gripping hand, things like generating 3D points with an uniform distribution on a sphere, or inside a ball, is still faster to do using Xorshift and the exclusion method, than direct generation of the proper distribution

Sure, there are tons of things that are much faster to check than to generate directly, and as long as the area/volume is a reasonable proportion of the unit square/cube the time wasted by rejected points is well worth it. So Mote it be.

Quote
Microcontrollers and computers nowadays have one huge difference: caching behaviour.  Computers have multi-level caching, with cache locality being a much bigger (relative) bottleneck than most realize; and I have not seen this covered well in any book.

Even fewer people realise that the reason their algorithm is slow is not even memory caching, but TLB. Many modern ARM processors can only jump around randomly between a maximum of 32 4kB memory pages before you start thrashing the TLB.

Even, for example, Intel Skylake can reference only 64 4kB pages from the L1 data TLB -- and it's only 4-way associative, so badly-chosen access patterns can see even five frequently-accessed bytes/words in different pages causing repeated TLB misses. The L2 TLB references 1536 pages (6144 kB). An L1 TLB miss that hits in L2 TLB incurs a 9 cycle delay.

Even if you can't get your frequently-used data into the same cache line, and least get it in the same 4kB page, people!

(megapages/gigapages do alleviate this if the virtual->physical mapping has large contiguous regions, and your OS supports identifying and exploiting this automatically)

Quote
My suggestion would be to pick a small subset of problems you're interested in, to approach data structures and algorithms from.  A practical approach, in other words.

My personal bugbear is people still tediously learning (and coding!) AVL and RedBlack trees!

The properties of caches, TLBs, and VM make binary trees basically obsolete, at least if you care about performance on large data sets. It makes much more sense to use what were originally developed as on-disk data structures, such as B-trees. If you don't care about iterating through the data in sorted order then hash tables are even better -- power-of-two sized hash tables, indexed by a hash code generated using one of those new hash functions I mentioned above (or if you have SHA hardware that may be faster).

If you really really feel you need a binary tree somewhere, look up "Scapegoat Tree". It doesn't need any extra data in each node, is far smaller code and simpler to write (even from memory, for example in an exam or competition), and also outperforms AVL and RedBlack by a lot.

Quote
My favourite example I probably have described on this forum elsewhere already, is to write a simple POSIX sort command replacement.  It turns out that the most user-friendly version (least real time used for any given data set; i.e., least real-world time the user needs to wait for the command to complete) is the one that reads the input lines into a tree or heap, because storage I/O is slow, and doing most of the sorting while still transferring data from disk means the least real-world time taken, even if the program spends more CPU time than if first reading everything in memory, and then using some nifty optimized sorting algorithm to sort the data.

That makes sense if your RAM is big enough to hold the file being sorted, but heap and tree data structures are not very friendly to Virtual Memory.

Back in 1985 I had a 28 MB data file that I needed to (re)sort fairly frequently, on a Data General MV/10000 with 4 MB RAM (and about 30 users).

I found that reading the data into an array in (virtual) memory and then using quicksort took about two minutes, vs over an hour for the system sort program.

Quicksort is very Virtual Memory friendly, as it has two pointers stepping one item at a time through the array (one backwards), and it examines (and maybe swaps) every item in every cache line or page of memory that it touches. Once a particular partition gets down to below your physical RAM size it will automatically be done entirely in-memory, and very quickly.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6844
  • Country: fi
    • My home page and email address
Re: Data Structure and Algorithms
« Reply #17 on: June 03, 2019, 02:32:28 pm »
We've learned a lot since then. And also about hashing functions that are not actually cryptographically secure, but are very close to it and also super fast. The whole evolution MurmurHash, CityHash, FarmHash, MetroHash for example.
Yes, exactly; and software engineering-wise, we buttress even cryptographic hash functions by making rare operations slow by repeated hashing. (For example, password hashing should be slow, so that dictionary attacks and similar are too costly, time-wise, in practice.)

I like that approach a lot.  We trust the science, but also apply sensible engineering principles to give us a practical buffer or leeway, in case the research turns out to be faulty somehow.  I would love to see this applied more in real-world programming.

Sure, there are tons of things that are much faster to check than to generate directly, and as long as the area/volume is a reasonable proportion of the unit square/cube the time wasted by rejected points is well worth it.
I definitely agree. My point was, not many books/lessons I've seen cover this aspect, or cover it with impractical assumptions (like "if trigonometric and logarithmic/exponential functions are as fast as multiplication and division"), usually without even stating those assumptions.  This leads to learners getting a completely wrong intuitive grasp of the subject at hand.  Then, later on, they either get super frustrated when things don't make sense, or worse yet, they spread their incorrect understanding to others.

Even fewer people realise that the reason their algorithm is slow is not even memory caching, but TLB.
Exactly!  This is excarberated by poor testing ("it works fine on my machine!") on common architectures like x86-64, where the behaviour/limits vary a lot.

Typically, you see someone microbenchmarking an operation, tweaking it to be "perfect" for a specific compiler version and options and hardware, when nothing else is running; and when incorporated into a larger application or system, the slowdown its poor cache behaviour causes to the overall operation, is blamed on somebody else.

My personal bugbear is people still tediously learning (and coding!) AVL and RedBlack trees!
They have their uses in OS and database implementation, but not much else, in my experience.  Oh, and when learning how to debug buggy code!  (That is, having a buggy binary-search/AVL/Red-Black/Splay tree implementation, and trying to locate the bug(s).)

Quote
My favourite example I probably have described on this forum elsewhere already, is to write a simple POSIX sort command replacement. [...]
That makes sense if your RAM is big enough to hold the file being sorted, but heap and tree data structures are not very friendly to Virtual Memory.
As it is a simplified POSIX command replacement exercise, having enough RAM (+swap) is a sensible assumption; most sort implementations do that anyway.

However, my point was exactly that even though it is technically inefficient to read the input into a tree or heap, the end result is more human-friendly: less elapsed real world time to get the result, even if more CPU time is used.  It is somewhat counterintuitive, and thus important to understand, IMHO.  (It also teaches one to ask better questions, like when somebody needs something to be fast, you know to ask "do you need it to complete as soon as possible, or to be as efficient (use as little CPU/memory resources) as possible?" Only a PHB says both, BTW.)

I found that reading the data into an array in (virtual) memory and then using quicksort took about two minutes, vs over an hour for the system sort program.
Similar findings are still very applicable in both embedded computing and distributed high-performance computing.  At the small end, memory is still quite limited (most routers have megabytes, not gigabytes of RAM, unlike phones, desktop computers, and servers); at the large end, latencies make people wait, and data transfer bandwidth is limited compared to the huge, huge amount of data one can process.

I still occasionally encounter related problems in real life (in large, distributed simulations); variants of external sorting (or rather external partitioning, instead of full sorting; the full dataset distributed to a number of machines, and re-distributing it to balance the amount of computation among the machines).  Currently, it is "solved" by transferring all data to a single node, then that single node parceling it back out to the other nodes.  You'd think that is okay, until you factor in the amount of real-world time wasted across entire clusters, time and again, just because that problem was considered not "important enough" to be solved by some programmers; and that at least that one node then needs enough memory to handle the entire dataset.  Most people in my field simply say "InfiniBand is fast enough, don't worry about it", but I'm not willing to take handwaving as evidence, as I've found such handwaving often is a result from a stinking pile of poo instead of knowledge.
 

Online coppice

  • Super Contributor
  • ***
  • Posts: 9384
  • Country: gb
Re: Data Structure and Algorithms
« Reply #18 on: June 03, 2019, 03:41:19 pm »
My personal bugbear is people still tediously learning (and coding!) AVL and RedBlack trees!
While these algorithms in their bare forms are a lot less useful than they used to be, I think they are an excellent starting point to get the student thinking about the nature of the problems these algorithms address. I don't think you can leap to memory aware algorithms without covering these basics.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4468
  • Country: nz
Re: Data Structure and Algorithms
« Reply #19 on: June 03, 2019, 10:03:36 pm »
My personal bugbear is people still tediously learning (and coding!) AVL and RedBlack trees!
While these algorithms in their bare forms are a lot less useful than they used to be, I think they are an excellent starting point to get the student thinking about the nature of the problems these algorithms address. I don't think you can leap to memory aware algorithms without covering these basics.

I think teaching simple binary trees is certainly useful, with noting they can become unbalanced e.g. degenerate to a linear list if elements are inserted in sorted order, but not attempting to balance them -- at least not by the fiddly little rotations and twirls used in AVL or RedBlack. I don't think that stuff benefits anyone.

It *is* useful to know you can re-balance an arbitrary binary tree in O(N) time, by simply converting it to degenerate list form down, say, the right links, and then recursively building a perfect tree by making a (left child) tree from the first N/2 items, take the next item as the root of the tree, and recursively building a (right child) perfect tree from the remaining N/2 items.

(that's what Scapegoat Tree does when it notices the tree is too much out of balance)
 

Offline 0culus

  • Super Contributor
  • ***
  • Posts: 3032
  • Country: us
  • Electronics, RF, and TEA Hobbyist
Re: Data Structure and Algorithms
« Reply #20 on: June 04, 2019, 04:08:48 am »
My personal bugbear is people still tediously learning (and coding!) AVL and RedBlack trees!
While these algorithms in their bare forms are a lot less useful than they used to be, I think they are an excellent starting point to get the student thinking about the nature of the problems these algorithms address. I don't think you can leap to memory aware algorithms without covering these basics.

Yeah, the point of covering this in an undergrad course is to build mathematical maturity around a lot of things that boil down to discrete mathematics and writing suitable proofs of correctness and grokking how big-Oh (and related notations) work. It's a building block. A hard one, but necessary. It was the weed out course in my CS curriculum.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6844
  • Country: fi
    • My home page and email address
Re: Data Structure and Algorithms
« Reply #21 on: June 04, 2019, 02:46:01 pm »
Funny aside:  When debugging trees and graphs, the printf() debugging method with output in Graphviz DOT language is a superior, indispensable tool; even though "printf debugging" is usually a derisive term, like "three star programming" in C.

If you haven't used Graphviz when debugging/analysing misconstructed trees or graphs, you've probably done it the unnecessarily complex and hard way.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf