Hacker Newsnew | past | comments | ask | show | jobs | submit | dakiol's commentslogin

Honest question: if you're using multiple agents, it's usually to produce not a dozen lines of code. It's to produce a big enough feature spanning multiple files, modules and entry points, with tests and all. So far so good. But once that feature is written by the agents... wouldn't you review it? Like reading line by line what's going on and detecting if something is off? And wouldn't that part, the manual reviewing, take an enormous amount of time compare to the time it took the agents to produce it? (you know, it's more difficult to read other people's/machine code than to write it yourself)... meaning all the productivity gained is thrown out the door.

Unless you don't review every generated line manually, and instead rely on, let's say, UI e2e testing, or perhaps unit testing (that the agents also wrote). I don't know, perhaps we are past the phase of "double check what agents write" and are now in the phase of "ship it. if it breaks, let agents fix it, no manual debugging needed!" ?


Here's what I suggest:

Serious planning. The plans should include constraints, scope, escalation criteria, completion criteria, test and documentation plan.

Enforce single responsibility, cqrs, domain segregation, etc. Make the code as easy for you to reason about as possible. Enforce domain naming and function / variable naming conventions to make the code as easy to talk about as possible.

Use code review bots (Sourcery, CodeRabbit, and Codescene). They catch the small things (violations of contract, antipatterns, etc.) and the large (ux concerns, architectural flaws, etc.).

Go all in on linting. Make the rules as strict as possible, and tell the review bots to call out rule subversions. Write your own lints for the things the review bots are complaining about regularly that aren't caught by lints.

Use BDD alongside unit tests, read the .feature files before the build and give feedback. Use property testing as part of your normal testing strategy. Snapshot testing, e2e testing with mitm proxies, etc. For functions of any non-trivial complexity, consider bounded or unbounded proofs, model checking or undefined behaviour testing.

I'm looking into mutation testing and fuzzing too, but I am still learning.

Pause for frequent code audits. Ask an agent to audit for code duplication, redundancy, poor assumptions, architectural or domain violations, TOCTOU violations. Give yourself maintenance sprints where you pay down debt before resuming new features.

The beauty of agentic coding is, suddenly you have time for all of this.


> Serious planning. The plans should include constraints, scope, escalation criteria, completion criteria, test and documentation plan.

I feel like i am a bit stupid to be not able to do this. my process is more iterative. i start working on a feature then i disocover some other function thats silightly related. go refactor into commmon code then proceed with original task. sometimes i stop midway and see if this can be done with a libarary somewhere and go look at example. i take many detours like these. I am never working on a single task like a robot. i dont want claude to work like that either .That seems so opposite of how my brain works.

what i am missing.


Again, here's what works for me.

When I get an idea for something I want to build, I will usually spend time talking to ChatGPT about it. I'll request deep research on existing implementations, relevant technologies and algorithms, and a survey of literature. I find NotebookLM helps a lot at this point, as does Elevenreader (I tend to listen to these reports while walking or doing the dishes or what have you). I feed all of those into ChatGPT Deep Research along with my own thoughts about the direction the system, and ask it to produce a design document.

That gets me something like this:

https://github.com/leynos/spycatcher-harness/blob/main/docs/...

If I need further revisions, I'll ask Codex or Claude Code to do those.

Finally, I break that down into a roadmap of phases, steps and achievable tasks using a prompt that defines what I want from each of those.

That gets me this:

https://github.com/leynos/spycatcher-harness/blob/main/docs/...

Then I use an adapted version of OpenAI's execplans recipe to plan out each task (https://github.com/leynos/agent-helper-scripts/blob/main/ski...).

The task plans end up looking like this:

https://github.com/leynos/spycatcher-harness/blob/main/docs/...

At the moment, I use Opus or GPT-5.4 on high to generate those plans, and Sonnet or GPT-5.4 medium to implement.

The roadmap and the design are definitely not set in stone. Each step is a learning opportunity, and I'll often change the direction of the project based on what I learn during the planning and implementation. And of course, this is just what works for me. The fun of the last few months has been everyone finding out what works for them.


You seem to work a lot like how I do. If that is being stupid, then well, count me in too. To be honest, if I had to go through all the work of planning, scope, escalation criteria, etc., then I would probably be better off just writing the damn code myself at that point.

i see lots of posts like stripes minion where they just type a feature into slack chat and agent goes and does it. That doesnt make any sense to me.

To be devil's advocate:

Many of those tools are overpowered unless you have a very complex project that many people depend on.

The AI tools will catch the most obvious issues, but will not help you with the most important aspects (e.g. whether you project is useful, or the UX is good).

In fact, having this complexity from the start may kneecap you (the "code is a liability" cliché).

You may be "shipping a lot of PRs" and "implementing solid engineering practices", but how do you know if that is getting closer to what you value?

How do you know that this is not actually slowing your down?


It depends a lot on what kind of company you are working at, for my work the product concerns are taken care by other people, I'm responsible for technical feasibility, alignment, design but not what features should be built, validating if they are useful and add value, etc., product people take care of that.

If you are solo or in a small company you apply the complexity you need, you can even do it incrementally when you see a pattern of issues repeating to address those over time, hardening the process from lessons learnt.

Ultimately the product discussion is separate from the engineering concerns on how to wrangle these tools, and they should meet in the middle so overbearing engineering practices don't kneecap what it is supposed to do: deliver value to the product.

I don't think there's a hard set of rules that can be applied broadly, the engineering job is to also find technical approaches that balance both needs, and adapt those when circumstances change.


On the one side I reject that product and engineering concerns are separated: Sometimes you want to avoid a feature due to the way it will limit you in the future, even if the AI can churn it in 2 minutes today.

On the other side perhaps your company, like most, does not know how to measure overengineering, cognitive complexity, lack of understanding, balancing speed/quality, morale, etc. but they surely suffer the effects of it.

I suspect that unless we get fully automated engineering / AGI soon, companies that value engineers with good taste will thrive, while those that double down into "ticket factory" mode will stagnate.


> On the one side I reject that product and engineering concerns are separated: Sometimes you want to avoid a feature due to the way it will limit you in the future, even if the AI can churn it in 2 minutes today.

That is exactly not what I meant, I'm sorry if it wasn't clear but your assumption about how my job works is absolutely wrong.

I even mention that the product discussion is separate only on "how to wrangle these tools":

> Ultimately the product discussion is separate from the engineering concerns on how to wrangle these tools, and they should meet in the middle so overbearing engineering practices don't kneecap what it is supposed to do: deliver value to the product.

Delivering value, which means also avoiding a feature that will limit or entrap you in the future.

> On the other side perhaps your company, like most, does not know how to measure overengineering, cognitive complexity, lack of understanding, balancing speed/quality, morale, etc. but they surely suffer the effects of it.

We do measure those and are quite strict about it, most of my design documents are about the trade-offs in all of those dimensions. We are very critical about proposals that don't consider future impacts over time, and mostly reject workarounds unless absolutely necessary (and those require a phase-out timeline for a more robust solution that will be accounted for as part of the initiative, so the cost of the technical debt is embedded from the get-go).

I believe I wasn't clear and/or you misunderstood what I said, I agree with you on all these points, and the company I work for is very much in opposite to a "ticket factory". Work being rejected due to concerns for the overall impact cross-boundaries on doing it is very much praised, and invited.

My comment was focused on how to wrangle these tools for engineering purposes being a separate discussion to the product/feature delivery, it's about tool usage in the most technical sense, which doesn't happen together with product.

We on the engineering side determine how to best apply these tools for the product we are tasked on delivering, the measuring of value delivered is outside and orthogonal to the technical practices since we already account for the trade-offs during proposal, not development time. This measurement already existed pre-AI and is still what we use to validate if a feature should be built or not, its impact and value delivered afterwards, and the cost of maintaining it vs value delivered. All of that includes the whole technical assessment as we already did before.

Determining if a feature should be built or not is ultimately a pairing of engineering and product, taking into account everything you mentioned.

Determining the pipeline of potential future non-technical features at my job is not part of engineering, except for side-projects/hack ideas that have potential to be further developed as part of the product pipeline.


Sorry, I think you're right that I misinterpreted your comment. I still had in mind OP's example (BDD, mutational testing, all that jazz). I apologize!

Reading your comment, it looks like you work for a pretty nice company that takes those things seriously. I envy you!

My concern was that for companies unlike yours that don't have well established engineering practices, it _feels_ that with AI you can go much faster and in fact it's a great excuse to dismantle any remaining practices. But, in reality they either doing busywork or building the wrong thing. My guess is that those are going to learn that this is a bad idea in the future, when they already have a mess to deal with.

To put what I mean into perspective... if you browse OP's profile you can find absolutely gigantic PRs like https://github.com/leynos/weaver/pull/76. I can not review any PR like that in good faith, period.


Can't upvote you enough. This is the way. You aren't vibe coding slop you have built an engineering process that works even if the tools aren't always reliable. This is the same way you build out a functioning and highly effective team of humans.

The only obvious bit you didn't cover was extensive documentation including historical records of various investigations, debug sessions and technical decisions.


Documentation is only useful if it is read. I have found it impossiple to get many humans to read the documentation i write.

Building a fancy looking process doesnt mean output isnt slop. Vibecoders on reddit have even more insane "engineering" process. parent comment has all these

Architecture & Design Principles • Single Responsibility Principle (SRP) • CQRS (Command Query Responsibility Segregation) • Domain Segregation • Domain-Driven Naming Conventions • Clear function/variable naming standards • Architectural constraint definition • Scope definition • Escalation criteria design • Completion criteria definition

Planning & Process • Formal upfront planning • Constraint-based design • Defined scope management • Escalation protocols • Completion criteria tracking • Maintenance sprints (technical debt paydown) • Frequent code audits

AI / Agentic Development Practices • Agent-assisted code audits • Agent-based feedback loops (e.g., reading .feature files pre-build) • Agent-driven reasoning optimization (code clarity for AI) • Continuous automated review cycles

Code Review & Static Analysis • Code review bots: • Sourcery • CodeRabbit • CodeScene • Automated detection of: • Anti-patterns • Contract violations • UX concerns • Architectural flaws

Linting & Code Quality Enforcement • Strict linting rules • Custom lint rules • Enforcement of lint compliance via bots • Detection of lint rule subversion

Testing Strategies

Core Testing • Unit Testing • BDD (Behavior-Driven Development) • .feature file validation before build

Advanced Testing • Property-based testing • Snapshot testing • End-to-end (E2E) testing • With MITM (man-in-the-middle) proxies

Formal / Heavyweight Testing • Model checking • Bounded proofs • Unbounded proofs • Undefined behavior testing

Emerging / Exploratory • Mutation testing • Fuzzing

Code Quality & Auditing • Code duplication detection • Redundancy analysis • Assumption validation • Architectural compliance checks • Domain boundary validation • TOCTOU (Time-of-check to time-of-use) vulnerability analysis

Development Workflow Enhancements • Continuous audit cycles • Debt-first maintenance phases • Feedback-driven iteration • Pre-build validation workflows

Security & Reliability Considerations • TOCTOU vulnerability detection • MITM-based E2E testing • Undefined behavior analysis • Fuzz testing (planned)


And here I am, just drawing diagrams on a whiteboard and designing UI in Balsamiq.

you are prbly shipping so that puts you ahead of most ppl still setting up their perfect process.

This is the biggest bottleneck for me. What's worse is that LLMs have a bad habit of being very verbose and rewriting things that don't need to be touched, so the surface area for change is much larger.

Not only that, but LLMs do a disservice to themselves by writing inconcise code, decorating lines with redundant comments, which wastes their context the next time they work with it

I have had good luck in asking my agent 'now review this change: is it a good design, does it solve the problem, are there excessive comments, is there anything else a reviewer would point out'. I'm still working on what promt to use but that is about right.

It's kind weird; I jumped on the vibe coding opencode bandwagon but using local 395+ w/128; qwen coder. Now, it takes a bit to get the first tokens flowing, and and the cache works well enough to get it going, but it's not fast enough to just set it and forget it and it's clear when it goes in an absurd direction and either deviates from my intention or simply loads some context whereitshould have followed a pattern, whatever.

I'm sure these larger models are both faster and more cogent, but its also clear what matter is managing it's side tracks and cutting them short. Then I started seeing the deeper problematic pattern.

Agents arn't there to increase the multifactor of production; their real purpose is to shorten context to manageable levels. In effect, they're basically try to reduce the odds of longer context poisoning.

So, if we boil down the probabilty of any given token triggering the wrong subcontext, it's clear that the greater the context, the greater the odds of a poison substitution.

Then that's really the problematic issue every model is going to contend with because there's zero reality in which a single model is good enough. So now you're onto agents, breaking a problem into more manageable subcontext and trying to put that back into the larger context gracefully, etc.

Then that fails, because there's zero consistent determinism, so you end up at the harness, trying to herd the cats. This is all before you realize that these businesses can't just keep throwing GPUs at everything, because the problem isn't computing bound, it's contextual/DAG the same way a brain is limited.

We all got intelligence and use several orders of magnitude less energy, doing mostly the same thing.


I highly recommend adding `/simplify` to your workflow. It walks back over-engineerings quite often for me.

It’s a blend. There are plenty of changes in a production system that don’t necessarily need human review. Adding a help link. Fixing a typo. Maybe upgrades with strong CI/CD or simple ui improvements or safe experiments.

There are features you can skip safely behind feature flags or staged releases. As you push in you fine with the right tooling it can be a lot.

If you break it down often quite a bit can be deployed safely with minimal human intervention (depends naturally on the domain, but for a lot of systems).

I’m aiming to revamp the while process - I wrote a little on it here : https://jonathannen.com/building-towards-100-prs-a-day/


I use coding agents to produce a lot of code that I don’t ship. But I do ship the output of the code.

Yep. In many cases I am just reviewing test cases it generated now.

> if it breaks, let agents fix it, no manual debugging needed!" ?

Pretty trivial to have every Sentry issue have an immediate first pass by AI now to attempt to solve the bug.


> you know, it's more difficult to read other people's/machine code than to write it yourself

Not at all, it's just a skill that gets easier with practice. Generally if you're in the position to review a lot of PR's, you get proficient at it pretty quickly. It's even easier when you know the context of what the code is trying to do, which is almost always the case when e.g. reviewing your team-mates' PR's or the code you asked the AI to write.

As I've said before (e.g. https://news.ycombinator.com/item?id=47401494), I find reviewing AI-generated code very lightweight because I tend to decompose tasks to a level where I know what the code should look like, and so the rare issues that crop up quickly stand out. I also rely on comprehensive tests and I review the test cases more closely than the code.

That is still a huge amount of time-savings, especially as the scope of tasks has gone from a functions to entire modules.

That said, I'm not slinging multiple agents at a time, so my throughput with AI is way higher than without AI, but not nearly as much as some credible reports I've heard. I'm not sure they personally review the code (e.g. they have agents review it?) but they do have strategies for correctness.


I'll often run 4 or 5 agents in parallel. I review all the code.

Some agents will be developing plans for the next feature, but there can sometimes be up to 4 coding.

These are typically a mix between trivial bug fixes and 2 larger but non-overlapping features. For very deep refactoring I'll only have a single agent run.

Code reviews are generally simple since nothing of any significance is done without a plan. First I run the new code to see if it works. Then I glance at diffs and can quickly ignore the trivial var/class renames, new class attributes, etc leaving me to focus on new significant code.

If I'm reviewing feature A I'll ignore feature B code at this point. Merge what I can of feature A then repeat for feature B, etc.

This is all backed by a test suite I spot check and linters for eg required security classes.

Periodically we'll review the codebase for vulnerabilities (eg incorrectly scoped db queries, etc), and redundant/cheating tests.

But the keys to multiple concurrent agents are plans where you're in control ("use the existing mixin", "nonsense, do it like this" etc) and non-overlapping tasks. This makes reviewing PRs feasible.


Are you kidding? What else would managers get credit from? They don't produce anything the company is interested in. They steer, they manage, and so if the ones being managed produce the thing the company is interested in, then sure all the credit goes to the team (including the manager!). As it usually happens, getting credit means nothing if not accompanied by a salary bump or something like that. And as it usually happens, not the whole team can get a salary bump. So the ones who get the bump are usually one or two seniors on the team, plus the manager of course... because the manager is the gatekeeper between upper management (the ones who approve salary bumps) and the ICs... and no sane manager would sacrifice a salary bump for themselves just to give it away to an IC. And that's not being a bad manager, that's simply being human. Also if you think about it, if the team succeeded in delivering "the thing", then the manager would think it's partially because of their managing, and so he/she would believe a salary bump is deserved

When things go south, no penalization is made. A simple "post-mortem" is written in confluence and people write "action items". So, yeah, no need for the manager to get the blame.

It's all very shitty, but it's always been like that.


I don't understand the "being more productive" part. Like, sure, LLMs make us iterate faster but our managers know we're using them! They don't naively think we suddenly became 10x engineers. Companies pay for these tools and every engineer has access to them. So if everyone is equally productive, the baseline just shifted up... same as always, no?

Mentioning LLM usage as a distinction is like bragging about using a modern compiler instead of writing assembly. Yeah it's faster, but so is everyone else code... Besides, I wouldn't brag about being more productive with LLMS because it's a double edge sword: it's very easy to use them, and nobody is reviewing all the lines of code you are pushing to prod (really, when was the last time you reviewed a PR generated by AI that changed 20+ files and added/removed thousands of lines of code?), so you don't know what's the long game of your changes; they seem to work now but who knows how it will turn out later?


Sometimes outcomes and achievements and work product are useful beyond just... stack ranking yourself against your peers. Seems so odd to me that this is your mentality unless you're earlier in your career.

Fair enough. I've been in software more than I would like to admit. And the more I'm in, the less I care about achievements in a work environment. All I care about is that the company pays me every month, because companies don't care about me (they care about my outome per hour/week/month). So it's essential to rank yourself high against your peers (being ethically and the like, ofc), otherwise you are out in the next layoff. I know not every company is like this, but the vast majority of tech companies are.

Outside of work, yeah, everything is fine and there's nothing but the pure pursue of knowledge and joy.


People would really be better off seeing themselves as mercenaries with health benefits. You are nothing more. You learn, you make friends, but your job is ephemeral. Do it, but don't get attached TO it.

The key there is "vast majority of tech companies". And I agree with you.

I think the next big movement in tech will be ALL companies becoming tech companies. Right now there are hundreds of thousands of "small" companies with big enough budgets to pay for a CTO to modernize their stack and lead them into the 21st century.

The problem is they don't know they have this problem and so they aren't actively hiring for a CTO. You've got to go find them and insert yourself as the solution.


All companies are like this. Some just have better HR/PR.

Usually hedonic adaptation ends up catching up, and then it’s just the new baseline.

> like bragging about using a modern compiler instead of writing assembly.

Yet people look at me like I'm the odd one out when I say I am more productive with a modern compiler like GHC.


It's not just about gas pricing, it's also about housing. E.g., why live in Paris, Madrid, Barcelona, Milan, if you can live in a cheaper (and way less populated) city? Going back to the office, even if it's 2 days/week completely defets decentralization of housing in most of Europe.


> This is the norm now for the past few years, and is one of the few ways to protect your job from being fully offshored.

Not necessarily true. A company that operates 100% remotely in country X not necessarily can hire people from other countries (and let them work there). I work for a french company, 100% remote. The company doesn't have branches in other companies, and so everyone works within France. This is ideal, because the HQ is in Paris, and many people don't (want to) live in Paris. Having to go to the office 2-3 times per week, makes it impossible for my company to hire outside of Paris... which is idiotic


> company that operates 100% remotely in country X not necessarily can hire people from other countries (and let them work there)

Can't speak for French companies aside from some players in DefenseTech and Quantum, but for most American companies this is a solved problem already - we already have a legal entity in most jurisdictions or the ability to spin one up within a couple days.

Additionally, if an organization is spending enough to open a dedicated branch in a country (even if it's only going to house 20-30 people), we tend to get FDI grants and subsidizes unlocked.

Pasqual did something similar when opening up their American campus in Chicago.

> Having to go to the office 2-3 times per week, makes it impossible for my company to hire outside of Paris... which is idiotic

There's no reason to - you aren't getting a significant cost benefit shifting hiring from Paris to (eg.) Toulouse, and are only incurring an additional operational headache.

At that point you may as well open a Francophone development office in Rabat or Tunis, or shift the office to Bucharest or Prague because the CEE countries can outcompete France in ICT hiring subsidies.


Nobody wants to review AI-generated code (unless we are paid for doing so). Open source is fun, that's why people do it for free... adding AI to the mix is just insulting to some, and boring to others.

Like, why on earth would I spent hours reviewing your PR that you/Claude took 5 minutes to write? I couldn't care less if it improves (best case scenario) my open source codebase, I simply don't enjoy the imbalance.


> Like, why on earth would I spent hours reviewing your PR that you/Claude took 5 minutes to write?

If the PR does what it says it does, why does it actually matter if it took 2 weeks or 2 minutes to put together, given that it's the equivalent level of quality on review?


“It works” is the bare minimum. Software is maintained for decades and should have a higher bar of quality.


> given that it's the equivalent level of quality on review?


One reason: if it takes 2 minutes to put together a PR, then you'll get an avalanche of contributions of which you have no time to review. Sure, I can put AI in fron to do the review, but then what's the point of my having an open source project?


> but then what's the point of my having an open source project?

For some people, the point was precisely to improve the software available to the global commons through a thriving and active open source effort. "Too many people are giving me too many high-quality PRs to review" is hardly something to complain about, even if you have to just pick them randomly to fit them in the time you have without AI (or other committers) to help review.

If your idea of open source is just to share the code you wanted to work on and ignore contributions, you can do that too. SQLite does that, after all.


> If the PR does what it says it does, why does it actually matter if it took 2 weeks or 2 minutes to put together, given that it's the equivalent level of quality on review?

You're right that the issue isn't how many minutes it took. The issue is that it's slop. Reviewing thousands of lines of crappy code is unpleasant whether they were autogenerated or painstakingly handcrafted. (Of course, few humans have the patience and resistance to learning to generate the amount of terrible code that AIs do routinely).


I get the frustration but I think this take only holds if you assume AI generated code is inherently worse. If someone uses Claude to scaffold the boilerplate and then actually goes through it properly, the end result is the same code you would have written by hand, just faster. The real problem is when people submit 14k lines they clearly did not read through. But that is a review process problem, not an AI problem. Bad PRs existed long before AI.


I resonate with OP a lot, and in my opinion, it's not about the code quality. It's about the effort that was put in, like in each LOC. I can't quite put it in words, but, like, the art comparison works quite well. If someone generates a painting with Gemini, it makes it somewhat heartless. It may still be good and bring the project forward (in case of this PR), but it lost every emotional value.

I would probably never be able to review this kind of code in open source projects without any financial compensation, because of that reason. Not because I don't like LLMs, not use LLMs, or think their code is of bad quality. But, while without LLMs I know there was a person who sat down and wrote all this in painstaking work, now I know that he or she barely steered a robot that wrote it. It may still be good work, and the steering and prompting is still work and requires skill, but for me I would not feel any emotional value in this code, and it would make it A LOT harder to gather motivation to review it. Interestingly, when I think about it, I realize that I would inherently have motivation to find out how the developer prompted the agent.

Like, you know, when I see a wooden statue of which I know it was designed and carved by someone in months of work, I could appreciate every single edge of the wood much more than if there's a statue that was designed by someone but carved by some kind of wooden CNC machine. It may be same statue and the same or even better quality, and it was still skillful work, but I lose my connection to it.

Can't quite pinpoint it, but for me, it seems, the human aspect is really important here, at least when it's about passion and motivation.

Maybe that made some sense, idk. I just wrote out of my ass.


Yes and no. Previously when someone submitted a 14k line PR you could be assured that they'd at least put a significant amount of time and effort into it, and the result was usually a certain floor on the quality level. Now that's no longer true.


In theory because the code being added is introducing a feature so compelling that it is worth it. In practice, that’s rarely the case.

My personal approach to open source is more or less that when I need a piece of software to exist that does not and there is no good reason to keep it private, it becomes open source. I don’t do it for fun, I do it because I need it and might as well share it. If someone sends me a patch that enhances my use case, I will work with them to incorporate it. If they send me a patch that only benefits them it becomes a calculus of how much effort would it take for me to review it. If the effort is high, my advice is to fork the project or make it easier for me to review. Granted I don’t maintain huge or vital projects, but that’s precisely why: I don’t need yet another programming language or runtime to exist and I wouldn’t want to work on one for fun.


Why do you care how much effort it took the engineer to make it? If there was a huge amount of tedium that they used Claude Code for, then reviewed and cleaned up so that it’s indistinguishable from whatever you’d expect from a human; what’s it to you?

Not everyone has the same motivations. I’ve done open source for fun, I’ve done it to unblock something at work, I’ve done it to fix something that annoys me.

If your project is gaining useful functionality, that seems like a win.


Because sometimes programming is an art and we want people to do it as if it was something they cared about. I play chess and this is a bit like that. Why do I play against humans? Because I want to face another person like me and see what strategies they can come up with.

Of course any chess bot is going to play better, but that's not the point


What about the other times?


I don't think node virtual filesystems is anything like chess.


Solving problems is not like chess? I want to use my brain, not sure why that's so complicated to understand


[flagged]


TIL that when I do anything that makes society label me as a "developer", I am not allowed to enjoy it, or feel about it in any way, as it's now a job, entirely neutral in nature, and I gotta do it, whether I hate or enjoy it - no attached emotions allowed.


Ignore the mercenaries. Here they are legion.

As for us (aspiring) craftsman, there are dozens of us! Dozens!


> Why do you care how much effort it took the engineer to make it?

Because they're implicitly asking me to put in effort as a reviewer. Pretending that they put more effort in than they have is extremely rude, and intentionally or not, generating a large volume of code amounts to misleading your potential reviewers.

> If there was a huge amount of tedium that they used Claude Code for, then reviewed and cleaned up so that it’s indistinguishable from whatever you’d expect from a human; what’s it to you?

They never do though. These kind of imaginary good AI-based workflows are a "real communism has never been tried" thing.

> If your project is gaining useful functionality, that seems like a win.

Lines of code impose a maintenance cost, and that goes triple when the code quality is low (as is always the case for actually existing AI-generated code). The cost is probably higher than the benefit.


I hate being paid to review AI slop.


If it's any consolation, at my company (and others) we have shifted the way we are doing tech interviews: nowadays we focus more on CS/software-engineering fundamentals rather than ability to code. We also still do the systems design interviews (for which fundamentals of networking and design are needed).

We care way less now about leetcode-like interviews (we used to have them before, but not anymore)


> 1. Does testing a candidate's ability to "steer" and debug AI-generated code make more sense to you than traditional algorithms?

Testing the candidate's ability to "steer" agents seems to be like testing their ability to know the Java API or to recite SOLID by heart.

> 2. How are you currently preventing these "prompt-only" developers from slipping through your own interview loops?

We don't ask anymore leetcode. We keep the usual systems design interview in which usage of AI is not needed (or at least we don't allow it because in this kind of interview we are more interested in seeing how the candidate thinks and so on)

We have a new stage in our job interview, though: generic Q/A about the fundamental of software engineering/computer science. Again, we don't care anymore how candidates produce code. We care about what they know, and what they don't know. What's the scope of their knowledge, and when do they need to rely on AI to come up with an answer. Silly (non-real) example: "Can you write a program that detects if another program halts?". The people we want are the ones who would say something about the Halting Problem but also perhaps be practical and perhaps ask more questions about such a program requirements.

You get the point: we look for people with a good breadth of knowledge, who can communicate well and know their shit. Whether they can use tool x or y (including LLMs), comes for granted for such people


This is a fantastic perspective, thank you. You hit the nail on the head: the ultimate goal is testing fundamental engineering breadth and systems thinking, not tool usage.

I should definitely clarify my use of the word steering — I completely agree that testing prompt engineering is just the new API memorization, which is useless.

By steering, I mean putting them in a situation where the AI generates a plausible but architecturally flawed solution, and seeing if they have the fundamental knowledge to spot the BS, understand the scope of the problem, and fix it.

Basically, an automated way to test the exact critical thinking you mentioned.

I love your approach of dropping LeetCode for fundamentals Q/A and Systems Design. But out of curiosity, how do you scale that at the top of the funnel? Doing deep, manual 1-on-1 assessments gives the best signal by far, but doesn't that burn a massive amount of your senior engineers' time?


> You could capture the behavior of every falling object on Earth in three variables and describe the relationship between matter and energy in five characters.

What we can do is to approximate. Newton had a good approximation some time ago about gravitation (force equals a constant times two masses divided by distance squared. Super readable indeed) But nowadays there's a better one that doesn't look like Newton's theory (Einstein's field equations which look compact but nothing like Newton's). So, what if in a 1000 years we have yet a better approximation to gravity in the universe but it's encoded in millions of variables? (perhaps in the form of a neural network of some futuristic AI model?)

My point is: whatever we know about the universe now doesn't necessarily mean that it has "captured" the underlaying essence of the universe. We approximate. Approximations are useful and handy and will move humanity forward, but let's not forget that "approximations != truth"

If we ever discover the underlaying "truth" of the universe, we would look back and confidently say "Newton was wrong". But I don't think we will ever discover such a thing, thereore sure approximations are our "truth" but sometimes people forget.


Einstein’s equations look like Newton’s in the limit. It would be a little weird if we ended up having to add millions of additional parameters over the next thousand years. At the current rate we seem to get multiple years per parameter, rather than hundreds of parameters per year, right?


This kind of view tends to logically conclude in the idea of a noumenal, unknowable reality. I think it's more reasonable to say that truth itself is gold star we award to descriptions that suit our purposes. After all, descriptions are necessarily approximations (or reductive or "compressions"), since the only model of a thing with 100% fidelity is... the thing itself.


Agreed!


> While this could be done by junior or senior, I think junior usually has the slight advantage in being more AI-native and knowing how to effectively prompt and work with AI, though not always.

But juniors don't (usually) have the knowledge to assess if what the AI has produced is ok or not. I agree that anybody (junior or senior) can produce something with AI, the key question is whether the same person has the skills to asses (e.g., to ask the right questions) that the produced output is what's needed. In my experience, junior + AI is just a waste of money (tokens) and a nightmare to take accountability for.


I don't see the value of a junior instructing an AI, because I as a senior can also instruct an AI.

I perceive the AI itself as a very fast junior that I pair program with. So you basically need the seniority to be able to work with a "junior ai".

The bar for human juniors is now way higher than it used to be.


>The bar for human juniors is now way higher than it used to be.

What do you think that is now? How does someone signal being 'past the bar'? If I hand wrote a toy gaussian splat renderer is that better than someone who used AI to implement a well optimized one with lots of features in vulkan?


'past the bar' means you have to be smarter than AI, simple as that. You need to be able to tell when it delivers good work, and when not. If you are not smarter than AI, you will not be able to tell the difference. And then what is your added value?


Perhaps in a year or so the AI will tell the human juniors what to do


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: