Hacker Newsnew | past | comments | ask | show | jobs | submit | sornaensis's commentslogin

IMO the solution is the same as org security: fine grained permissions and tools.

Models/Agents need a narrow set of things they are allowed to actually trigger, with real security policies, just like people.

You can mitigate agent->agent triggers by not allowing direct prompting, but by feeding structured output of tool A into agent B.


I've been working on my own thing with more of a 'management' angle to it. It lets me connect memories to tasks and projects across all of my workspaces, and gives me a live SPA to view and edit everything, making controlling what the models are doing a lot easier in my experience https://github.com/Sornaensis/hmem in a way that suits how I think vs other project management or markdown systems.

I would be interested in trying to make the models go into more of a research mode and organize their knowledge inside it, but I've found this turns into something like LLM soup.

For coding projects, the best experience I have had is clear requirements and a lot of refinement followed through with well documented code and modules. And only a few big 'memories' to keep the overall vision in scope. Once I go beyond that, the impact goes down a lot, and the models seem to make more mistakes than I would expect.


As a rule people do not read the linked content, they come to discuss the headline.

The first indication to me this was AI was simply the 'project structure' nonsense in the README. Why AI feel this strong need to show off the project's folder structure when you're going to look at it via the repo anyway is one of life's current mysteries.


Honestly, maybe this is the problem.

A web-of-trust-like implementation of votes and flags, as suggested below, might be a solution, but I feel like it's an overkill. I've recently flagged a different clickbait submission, about Android Developer Verification, whose title suggested a significant update but that merely linked to the same old generic page about the anti-feature that was posted here months prior. Around 100 points too, before a mod stepped in, changed the title, and took it down.

Maybe the upvote button is just too easy to reach? I have a feeling that hiding it behind CSS :visited could make a massive difference.


It all depends on the tools. AI will surely give a competitive advantage to people working with better languages and tooling, right? Because they can tell the AI to write code and tests in a way that quashes bugs before they can even occur.

And then they can ship those products much faster than before, because human hours aren't being eaten up writing out all of these abstractions and tests.

The better tooling will let the AI iterate faster and catch errors earlier in the loop.

Right?


I've been going heavily in the direction of globally configured MCP servers and composite agents with copilot, and just making my own MCP servers in most cases.

Then all I have to do is let the agents actually figure out how to accomplish what I ask of them, with the highly scoped set of tools and sub agents I give them.

I find this works phenomenally, because all the .agent.md file is, is a description of what the tools available are. Nothing more complex, no LARP instructions. Just a straightforward 'here's what you've got'.

And with agents able to delegate to sub agents, the workflow is self-directing.

Working with a specific build system? Vibe code an MCP server for it.

Making a tool of my own? MCP server for dev testing and later use by agents.

On the flipside, I find it very questionable what value skills and reusable prompts give. I would compare it to an architect playing a recording of themselves from weeks ago when talking to their developers. The models encode a lot of knowledge, they just need orientation, not badgering, at this point.


I’ve had success with this general approach too.

The best thing I’ve done so far is put GitHub behind an API proxy and reject pushes and pull requests that don’t meet a criteria, plus a descriptive error.

I find it forgets to read or follow skills a lot of the time, but it does always try to route around HTTP 400s when pushing up its work.


Weird clone of OpenFront but with no UI .. ? Custom game doesn't work, tons of errors, can't even see where I am, or understand what is going on.

Doesn't seem like you played your own game before submitting it.


This is a WIP version, and custom game seems to work for many players. I would've really appreciated if you told me how it didn't work so I can fix it.


why play when you can vibecode?


Trying to go the Spec -> LLM route is just a lost cause. And seems wasteful to me even if it worked.

LLM -> Spec is easier, especially with good tools that can communicate why the spec fails to validate/compile back to the LLM. Better languages that can codify things like what can actually be called at a certain part of the codebase, or describe highly detailed constraints on the data model, are just going to win out long term because models don't get tired trying to figure this stuff out and put the lego bricks in the right place to make the code work, and developers don't have to worry about UB or nasty bugs sneaking in at the edges.

With a good 'compilable spec' and documentation in/around it, the next LLM run can have an easier time figuring out what is going on.

Trying to create 'validated english' is just injecting a ton of complexity away from the area you are trying to get actual work done: the code that actually runs and does stuff.


I have good success using Copilot to analyze problems for me, and I have used it in some narrow professional projects to do implementation. It's still a bit scary how off track the models can go without vigilance.

I have a lot of worry that I will end up having to eventually trudge through AI generated nightmares since the major projects at work are implemented in Java and Typescript.

I have very little confidence in the models' abilities to generate good code in these or most languages without a lot of oversight, and even less confidence in many people I see who are happy to hand over all control to them.

In my personal projects, however, I have been able to get what feels like a huge amount of work done very quickly. I just treat the model as an abstracted keyboard-- telling it what to write, or more importantly, what to rewrite and build out, for me, while I revise the design plans or test things myself. It feels like a proper force multiplier.

The main benefit is actually parallelizing the process of creating the code, NOT coming up with any ideas about how the code should be made or really any ideas at all. I instruct them like a real micro-manager giving very specific and narrow tasks all the time.


TBH it kinda makes sense why personal projects are where productivity jumps are much larger.

Working on projects within a firm is... messy.


This seems like a step backwards. Programming Languages for LLMs need a lot of built in guarantees and restrictions. Code should be dense. I don't really know what to make of this project. This looks like it would make everything way worse.

I've had good success getting LLMs to write complicated stuff in haskell, because at the end of the day I am less worried about a few errant LLM lines of code passing both the type checking and the test suite and causing damage.

It is both amazing and I guess also not surprising that most vibe coding is focused on python and javascript, where my experience has been that the models need so much oversight and handholding that it makes them a simple liability.

The ideal programming language is one where a program is nothing but a set of concise, extremely precise, yet composable specifications that the _compiler_ turns into efficient machine code. I don't think English is that programming language.


Can someone explain to me why anyone would do this, and then tweet about it..? Is he really trying to blame 'ai agents' and 'terraform' .. ??


It is hardly a new problem =3

"Memoirs of extraordinary popular delusions and the madness of crowds" (Charles Mackay, 1852)

https://www.gutenberg.org/files/24518/24518-h/24518-h.htm


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: