GPT-4 is about 45 gigabytes. https://dumps.wikimedia.org/other/kiwix/zim/wikipedia/wikipe... , a recent dump of the English wikipedia, is over twice that, and that's just English. Plus AIs are expected to know about other languages, science, who even knows how much Reddit, etc.
There literally isn't room for them to know everything about everyone when they're just asked about random people without consulting sources, and even when consulting sources it's still pretty easy for them to come in with extremely wrong priors. The world is very large.
You have to be very careful about these "on the edge" sorts of queries, it's where the hallucination will be maximized.
It wouldn't matter if they trained on a quadrillion tokens, or another ten orders of magnitude. There's only so much information you call stuff into a given set of numbers.
But once again I am reminded, never make arguments based on information theory. Nobody understands it.
There literally isn't room for them to know everything about everyone when they're just asked about random people without consulting sources, and even when consulting sources it's still pretty easy for them to come in with extremely wrong priors. The world is very large.
You have to be very careful about these "on the edge" sorts of queries, it's where the hallucination will be maximized.