Well, I can accept both arguments myself, as it really is about what information you consider most useful and important to convey to others.
I mostly deal with noisy datasets (with obviously erroneous measurements), especially microbenchmarking results for various algorithms (where events external to the benchmark will occasionally affect the measurement, and such errors are always relatively large, as there is a minimum duration for any kind of interruption in the benchmark), and have found the most useful report to be of k'th percentiles; as in, 97% of all runs completed within time T. The percentile I choose reflects my trust in the measurement setup; the higher, the more repeatable/reproducible (and thus better) the setup.
For other kinds of measurements, giving the actual distribution (or histogram) of measurement is best, of course; that way one can best compare reproduced measurements, and any asymmetry and deviation from standard/Gaussian distribution tells something about the measurements and the values being measured. The next best thing, in my opinion, is form \$m^{+a}_{-b} ~ (p\%)\$, where \$m\$ is the median, and \$a\$ and \$b\$ are the symmetric \$p\$'th percentile error bounds. That is, \$p\%\$ of all measurements were between \$m-b\$ and \$m+a\$, inclusive; with \$(100-p)/2\$ percent of measurements below \$m-b\$, and \$(100-p)/2\$ percent of measurements above \$m+a\$. While the standard 68.3% error bars are well suited for standard/Gaussian distributions, one can choose a much larger \$p\$ to describe ones confidence in the testing methodology: the higher the \$p\$, the fewer measurements are rejected, and therefore the better the confidence in the implementation of the measurements.
I'm not a metrologist, I only watch an occasional Oxtools and Joe Pie videos on Youtube. In all my posts in this thread, by "useful" I have meant "useful to those like myself, who use the reported numbers to build up an intuitive understanding of the properties and limits of this method/apparatus/approach". If I've understood correctly, the numbers you/gitm have reported, \$p = 100\%\$, with \$\text{span} = 100\% \cdot (a - b) / m\$, and \$m\$ also reported. To get a correct mental picture of the limits of the distribution, leaving its shape unspecified, and what kind of measurements one would "expect" if constructing similar setup, it would be intuitively easier if \$m\$ refers to median, and not the mean. If you think about it, for that purpose, using mean for \$m\$ makes it quite difficult to wrap ones mind around what it means to the entire distribution of measurements.
Furthermore, if \$m\$ is mean, then all comparisons to the result someone else got from duplicating the measurements, must be based on the mean also. (Having distributions/histograms, and knowing the median, is then useless information and not worth gathering.) Which is okay, if that is the way the equipment works, or how it has to be done in practice (by convention, or for some other reason).
Mean differs from median only when the distribution is asymmetric (wrt. median). For all symmetric distributions, mean = median. The difference between mean and median depends on the asymmetry and width (deviation) of the distribution. Thus, if you repeat an experiment (getting a set of results you wish to compare), and your errors have a different distribution than the one you are comparing to, the two mean will differ from the median by different amounts. Then, the error bars will also include different portions of the two distributions, neither centered on the middle of the distribution! Essentially, you are comparing apples to oranges. And without knowing the median on both, we cannot even tell we're comparing apples to oranges, because we don't know how much each mean differs from the median. Essentially, the entire distribution becomes un-useful information, and we can only compare the two means.
Which, again, is acceptable if metrologists always work with mean/average, and never even look at the actual measurement distributions/histograms and consider those non-useful. If everything you can compare to is mean/average-based, then having a histogram or distribution –– or thinking in such terms! –– is useless and extra unneeded work for nothing.