Ok, I thought they were just copy pasting stuff that seemed somewhat relevant to some stuff they found through trial and error. Kind of like "which equation is the best fit for a chaotic data set". Sometimes it makes sense when you can closely define it for physics reasons... however its really delicate. They invented so much statistical magic to clear things up.
I can believe that there is bad data because I noticed its really easy to fuck up a large data gathering operation with bad procedures or assumptions, and that this data might lead to some bullshit trend being found.
I think its often dubious they came to the conclusions they have with the 'back ground math' they sometimes talk about.
Because someone with no idea about the process gives some random good sounding number for the # of electrical tests needed, resolution required, etc. Then someone is told to implement this, but its not possible to actually implement like that, but its done anyway, because it is the job. Then they start cranking data by any means necessary.
Data gathering roles in corporate, they often have the side effect of being related to hard quotas. And this is a place where management thinks that the process is simple enough to automate and give to total drones. And the assumption that every case takes a similar amount of time to analyze is very often completely false, some tests end up having alot of complications that are swept up under the rug. AKA the outlier count is way higher then expected. Then it gets 'simplified' on the spot so the quota is capable of being met. This is the domain of non technical managers. No one wants to hear that some sample is taking a extremely longer amount to analyze, because the operation is typically deemed to be a 'easy job' that should 'run smooth'.
In essence, it might have a similar result to KGB sabotage.

This often makes me dubious of 'advanced conclusions' that are achieved through 'data science', which are used for certain equations. It kind of like those people that think they can get 32 bits from a 10 bit MCU ADC by doing enough data processing. Sometimes I think it should be called 'bored with matlab'
And the review process is done by people that don't know the mechanisms of the underlying process well, but they are taught to trust statistical math. I heard that before from a analyst, the "I don't need to know how it works..." Sure they catch some things, but the conclusions that they sometimes reach are totally asinine. And the people in charge of them think that they can some how figure out if the data their getting is bad. No, they can't, they don't know why, or what it is, or how it was obtained on any thing more then a superficial level. They are not telepathic. GIGO
So to me its totally expected that there are garbage equations that got put together from bad data from dodgy labs.
And naturally you know there are statistical tests to determine if you are trying to read tea leaves. However, it tends to get 'normalized' aka 'new normal' for those coefficients to be within some small range because of whatever special circumstance they cite. It can take a REALLY long time to discover these kinds of issues.
And then sometimes the cases that are used to 'prove' things later are exceedingly simplified. And external people making assumptions that there is good and bad companies for trust, but failing to take into account that there are good and bad labs inside the same organization.