It's actually not arbitrary! I measure my PR rate / ticket close rate before and after, which are generally tied to agreed on features / bugs (often user requested / reported ones). I think if it were commit rate or lines of code it would be less meaningful, but at least a (non refactoring) PR should indicate some level of increased user value / bug fix. Sure we could categorize it further and break it down more effectively, I'll not die on the sword of 3x, maybe its 1.5x, maybe its 4x. Neither seems a very meaningful difference when the comparison being discussed is 0x or even -X. The latter I think _most_ of the time is going to be prompt or task related, which is why I think its so important to share and discuss (particularly the negative case!)
Which would be ironic as LLM usage has been observed to increase the sensation of productivity even when productivity is measurably reduced. Not to mention the "vibe" component of vibe coding