In scope for the context of this thread though; your GP claimed that 4 sigmas means “it’ll probably pan out as being real”, your parent provided a 6-sigma counter example.
> your GP claimed that 4 sigmas means "it'll probably pan out as being real"
No they didn't; they claimed that 4 sigmas means it will probably turn out to be something other than statistical noise. They made no claims about "it's real" versus "it's a systematic, non-statistical error".
Or the title of this topic as it is right now is misleading. It says they’ve confirmed the stronger magnetic field. Ie it was either predicted elsewhere or seen elsewhere. The later would build confidence in the testing apparatus.
At the time it was very significant results, just like this one.
Turned out someone hadn't plugged a piece of equipment in right and it was very precisely measuring that flaw in the experiment.
You can't look at any 8 sigma result and just state that it must necessarily be true. Your theory may be flawed or you may not understand your experiment and you just have highly precise data as to how you've messed something else up.
It's probably worth saying that even "chance" is still a little misleading in the sense that the quantification of that chance is still done by the physicists and therefore can be biased
Of course, this is still not good enough. But the nice thing about things that are real is they eventually stand up to increasing levels of self-doubt and 3rd party verification... it’s an extraordinary result (because, of course, the Standard Model seems to be sufficient for just about everything else... so any verified deviation is extraordinary), and so funding shouldn’t be a problem.
A decent heuristic: Real effects are those that get bigger the more careful your experiment is (and the more times it is replicated by careful outsiders), not smaller.
"Separate" for slightly small values of separate. It's the same measurement approach, and using many components from the first experiment, so there could be correlated errors. But they made many fundamental improvements to the experiment, so it's great to see that the effect hasn't gone away.
The primary shared component is the ring/yoke. I worked in the same lab as a substantial team of g-2 scientists for the last decade and watched them come to this result. The level of re-characterization of the properties of the entire instrument was extremely extensive. If anything, one should regard the lessons that they have learned along the way as providing extra insight into the properties of the original BNL measurement.
To use a car analogy: This is as if you took someone's prize-winning race car, kept the moderately-priceless chassis, installed upgraded components in essentially every other sense (remove the piston engine, install a jet engine, remove the entire cockpit and replace with modern avionics, install entirely new outer shell, replace the tires with new materials that are two-decades newer...), put the car through the most extensive testing program anyone has ever performed on a race car, filled the gas tank with rocket fuel, and took it back to Le Mans.
I believe that the likelihood of a meaningful ring-correlated systematic, while still possible, is quite low in this case. The magnetic-field mapping, shimming, and monitoring campaigns, in particular, should give people confidence that any run-to-run correlated impact of the ring ought to be very small.
It just shows probabilistic significance. Confirmation by independent research teams helps eliminate calculation and execution errors.