It's usually somewhat easy to verify a piece of code is not obviously wrong, wha...

jdm2212 · on June 20, 2023

"this looks about right and has no obvious bugs" is my standard when reviewing human code, and it's my standard for machine-generated code too. no reason to formally verify GPT-4 outputs if I'm not formally verifying my coworker's either.

bagacrap · on June 20, 2023

My standard changes based on which human's code I'm reviewing though.

Unless I'm feeling particularly lazy or the code isn't important.

AnimalMuppet · on June 20, 2023

Well... after fairly long experience, we have discovered that your standard is mostly adequate for human generated code (as long as it's not going into a critical system). That may be based on the (empirically collected) statistics of how human-generated code fails - that if it's wrong, it usually either "looks" wrong or obviously fails.

GPT-produced code may have different failure statistics, and therefore the human heuristic may not work for GPT-produced code. It's too early to tell.

interroboink · on June 20, 2023

I'm reminded of a friend who worked in radio hardware design. They'd use simulation and fuzzy/genetic algorithms to create a circuit, and then verify its performance with experiments. But they couldn't always say exactly why the circuit worked, just that it met the performance criteria.

It's an interesting divergence in software, between those who manage complexity by adding more human-understandable abstraction, and those who manage it by just verifying the results, letting the complexity fly free. All the ML stuff is definitely taking big steps down the latter path.