I guess that if you have three references, called A, B and C, and measure each of them in turn against one of the others, then you might get results along the lines of:
A vs B = noisy
A vs C = quiet
B vs C = noisy
In this case, A and C are both quiet, so the comparison between the two yields a quiet result too. But, since B is noisy, measuring it with respect to either of the two others yields a noisy result.
I'm not sure whether that type of comparison would actually warrant a name as such, but it seems like the obvious way to identify which is the outlier. Moreover, it seems fairly apparent that if the three references were 'quiet', 'noisy' and 'noisier', then the set of comparisons between them would show different degrees of noisiness, from which the relative merit of each reference could be readily determined.