It's usually somewhat easy to verify a piece of code is not obviously wrong, what's much harder is proving that a piece of code is not subtlety wrong. When given a complete piece of code that appears to work, it can be very easy to convince yourself that you understand it well enough to know that it is correct, even when it's not. This problem isn't unique to LLMs, refer to the case of programmers copying binary search from textbooks without understanding how it works in their programming language of choice [1]. The problem is avoided (or at least minimised) by formal verification, which is where I think we should be heading with LLM code generation; this additionally avoids the problems with trying to accurately provide a specification in plain English.
"this looks about right and has no obvious bugs" is my standard when reviewing human code, and it's my standard for machine-generated code too. no reason to formally verify GPT-4 outputs if I'm not formally verifying my coworker's either.
Well... after fairly long experience, we have discovered that your standard is mostly adequate for human generated code (as long as it's not going into a critical system). That may be based on the (empirically collected) statistics of how human-generated code fails - that if it's wrong, it usually either "looks" wrong or obviously fails.
GPT-produced code may have different failure statistics, and therefore the human heuristic may not work for GPT-produced code. It's too early to tell.
I'm reminded of a friend who worked in radio hardware design. They'd use simulation and fuzzy/genetic algorithms to create a circuit, and then verify its performance with experiments. But they couldn't always say exactly why the circuit worked, just that it met the performance criteria.
It's an interesting divergence in software, between those who manage complexity by adding more human-understandable abstraction, and those who manage it by just verifying the results, letting the complexity fly free. All the ML stuff is definitely taking big steps down the latter path.
[1] https://ai.googleblog.com/2006/06/extra-extra-read-all-about...