An issue with a sorting algo is just how random or discrete the input data is. [...] This is far beyond my school maths
It's not nearly as difficult to quantify as one would believe;
algorithm analysis isn't
hard per se, it's one of those things where the hurdle is to understand the concepts first, and then the mathy part is almost always very easy.
A first approximation is to examine the
best case cost,
amortized average cost, and the
worst-case cost, and estimate how often do each of them occur.
You can trivially instrument your sort function to count the number of comparisons and swaps (or whatever primitives it uses), and then examine how they vary for different inputs.
One interesting and practically useful way to examine the statistical distribution of the complexity of a sort operation –– that is, how it behaves for different sortednesses of input –– is to start with perfectly ordered data (one case is preferred order, and the other is inverse order), and then increasingly
shuffle it (using a known good pseudorandom shuffle; this is very important), while counting the number of operations when sorting the shuffled data. You do this a few times in parallel, and you get an excellent statistical picture of how the sort behaves overall.
(Note that if you have
N elements in the array, you only need a comparable amount, say 7
N, of shuffle operations (insertions or swaps) to "completely randomize it". So this is also a very practical way of characterising a sort operation in terms of its primitives.)
In general, for a good sort function with bad worst case behaviour (like say Quicksort), you want the number of operations descend very quickly from the worst case towards the minimum, with minimum obtained for most inputs, as the amount of shuffle/"randomness" in the original data increases.
As to benchmarking actual sorting code, the key is to always use real-world data. Current computers are so complex wrt. timing that microbenchmarks just don't cut it.
When you have large enough arrays, the cache access pattern of the data becomes a bottleneck, unless your data structures are not amenable to sorting anyway. (When sorting say floating-point numbers, it
can be done in linear time, since there is a fixed number of possible keys, using a radix sort. Because of its cache behaviour, it just is slower than asymptotically worse sort algorithms (those with O(N log N) time complexity in particular) until N becomes unreasonably large, something on the order of 10⁹ or larger. And even then, it usually turns out that you can do the sorting "for free" while doing other stuff to the data, due to I/O being bottleneck (so the "cost" of doing the sort then is just a bit more CPU work, but wall clock time taken is not affected.)
Then again, if you are doing sorting on the background, and you don't care how long it takes as you want to minimize the total CPU and I/O load needed to complete the sort, you need a completely different algorithm and approach compared to when there is a human waiting for the sort to complete (in which case you want to minimize the
latency caused by sorting; i.e. wall clock time taken, before the human can continue doing whatever it is they are doing).
Which means that there is no "best" sorting algorithm at all, because the criteria for "goodness" depends on the use case.
To find out, all you need is time and effort. However, today's world seems to be hell bent on just getting it done as fast as possible, so throwing a library or sort algorithm that everybody else is using on such a problem is much more commercially viable approach. Nobody appreciates much efforts trying to compare different sort algorithms, because they've already made up their mind as to which one is superior.