There always has been this tension between protecting resources and allowing users to access those resources in security. With many systems you have admin/root users and regular users. Some things require root access. Most interesting things (from a security point of view) live in the user directory. Because that's where users spend all their time. It's where you'll find credentials, files with interesting stuff inside, etc. All the stuff that needs protecting.
The whole point of using a computer is being able to use it. For programmers, that means building software. Which until recently meant having a lot of user land tools available ready to be used by the programmer. Now with agents programming on their behalf, they need full access to all that too in order to do the very valuable and useful things they do. Because they end up needing to do the exact same things you'd do manually.
The current security modes in agents are binary. Super anal about absolutely everything; or off. It's a false choice. It's technically your choice to make and waive their liability (which is why they need you to opt in); but the software is frustrating to use unless you make that choice. So, lots of people make that choice. I'm guilty as well. I could approve every ansible and ssh command manually (yes really). But a typical session where codex follows my guardrails to manage one of my environments using ansible scripts it maintains just involves a whole lot such commands. I feel dirty doing it. But it works so well that doing all that stuff manually is not something I want to go back to.
It's of course insecure as hell and I urgently need something better than yolo mode for this. One of the reasons I like codex is that (so far) it's pretty diligent about instruction following and guard rails. It's what makes me feel slightly more relaxed than I perhaps should be. It could be doing a lot of damage. It just doesn't seem to do that.
Not necessarily kill; but it will slowly push them off the critical path. Local agents can delegate to remote sub agents as needed but should default to local processing for low cost and latency reasons.
I think the notion of a one size fits all model that is a bit like a sports car in the sense that just get the biggest/fastest/best one is overkill; you use bigger models when needed. But they use a lot of resources and cost you a lot. A lot of AI work isn't solving important math or algorithm problems. Or leet coding exercises. Most AI work is mundane plumbing work, summarizing, a bit of light scripting/programming, tool calling, etc. With skills and guard rails, you actually want agents to follow those rather than get too creative. And you want them to work relatively quickly and not overthink things. Latency is important. You can actually use guard rails to decide when to escalate to bigger models and when not to.
Exactly. There's a difference between vibe coding and agentic software engineering. One is just prompting and hoping for the best. It works surprisingly well, up to a point. And then it doesn't. If that's happening to you, you might be doing it wrong. The other is forcing agents to do it right. Working in a TDD way, cleaning up code that needs cleaning up, following processes with checklists, etc. You need to be diligent about what you put in there and there's a lot of experience that translates into knowing what to ask for and how. But it boils down to being a bit strict and intervening when it goes off the rails and then correcting it via skills such that it won't happen again.
I've been working on an Ansible code base in the past few weeks. I manually put that together a few years ago and unleashed codex on it to modernize it and adapt it to a new deployment. It's been great. I have a lot of skills in that repository that explain how to do stuff. I'm also letting codex run the provisioning and do diagnostics. You can't do that unless you have good guard rails. It's actually a bit annoying because it will refuse to take short cuts (where I would maybe consider) and sticks to the process.
I actually don't write the skills directly. I generate them. Usually at the end of a session where I stumbled on something that works. I just tell it to update the repo local skills with what we just did. Works great and makes stuff repeatable.
I'm at this point comfortable generating code in languages I don't really use myself. I currently have two Go projects that I'm working on, for example. I'm not going to review a lot of that code ever. But I am going to make sure it has tests that prove it implements detailed specifications. I work at the specification level for this. I think a lot of the industry is going to be transitioning that direction.
What sodium ion lacks in energy density, it actually partially gains back in the reduced need for cooling. The same properties that make it work across a larger temperature range also mean that you don't need a lot of (or any) cooling/heating to condition the battery. That means less weight is used for that and less energy is needed for running a heat pump.
Another thing here is that volumetric density matters more than weight density in cars. Space comes at a premium and while weight affects efficiency somewhat, it pales in comparison to aerodynamics and rolling resistance. The difference between the best and the worst cars on the road is at least 3x. You have some heavy, brick shaped, monstrosities that barely do 1.5 miles per kwh and then you have some cars with low drag coefficient that easily do 5-6 miles per kwh. Even swapping tires can add meaningful range. Weight reductions help a bit but the difference between the best and worst energy densities on a 60kwh battery is probably 1-2 big passengers in terms of weight.
Peak energy makes sodium ion batteries for energy storage. Their pilot batteries are deployed in a desert. High temperatures during the day, freezing temperatures at night. They use only passive cooling without any moving parts (fans, pumps, etc.). Aside from that being impressive, that also lowers maintenance cost because it reduces the amount of stuff that actually needs servicing.
Sodium ion gains back volume because it doesn't need cooling. At the cell level, they are worse but at the pack level, it starts looking pretty decent. Anyway, there are multiple sodium ion batteries on the road now in China. It's practical right now. The rest is just the widening technology gap the US and EU have with China. We'll just have to wait a few years for local manufacturers to catch up. Some models with these batteries will probably start making it to the EU in the next two years or so.
Same here, I never even got in. I never managed to get in. My account is good enough to take my money for other things but somehow I can't manage to onboard into the damn thing so that I can actually manage devices for my company. I just gave up in the end. Couldn't get it done.
I'll try again next month see how far I get with this. This needs to be way simpler than it currently is. Hopefully they fixed a few things there.
Something like OpenAIs agent mode where it drives a mouse and keyboard but against an emulator should be doable. That agent mode is BTW super useful for doing QA and executing elaborate test plans and reporting issues and UX problems. I've been meaning to do more with that after some impressive report I got with minimal prompting when I tried this a few months ago.
That's very different from scripting together what is effectively a whitebox test against document ids which is what people do with things like playwright. Replacing manual QA like that could be valuable.
Exactly. We need more tools like this. With the right model, picking apart images and videos isn't that hard. Adding vision to your testing removes a lot of guess work from ai coding when it comes to fixing layout bugs.
A few days ago I had a interaction with codex that roughly went as follows, "this chat window is scrolling off screen, fix", "I've fixed it", "No you didn't", "You are totally right, I'm fixing it now", "still broken", "please use a headless browser to look at the thing and then fix it", "....", "I see the problem now, I'm implementing a fix and verifying the fix with the browser", etc. This took a few tries and it eventually nailed it. And added the e2e test of course.
I usually prompt codex with screenshots for layout issues as well. One of the nice things of their desktop app relative to the cli is that pasting screenshots works.
A lot of our QA practices are still rooted in us checking stuff manually. We need to get ourselves out of the loop as much as possible. Tools like this make that easier.
I think I recall Mozilla pioneering regression testing of their layout engine using screenshots about a quarter century ago. They had a lot of stuff landing in their browser that could trigger all sorts of weird regressions. If screenshots changed without good reason, that was a bug. Very simple mechanism and very effective. We can do better these days.
ah feel your pain.. Codex interaction is exactly the pain point. “I fixed it” / “no you didn’t” five times in a row, you feel gaslighted by your own agent in a way. That’s the loop I wanted to kill. I didnt' know about Mozilla screenshot regression actually
A good mental model for what we do in the IT industry ever since it came into existence is automating things that feel repetitive and uncreative. Drudgery is a thing where for whatever reason technology falls short and you have to do tedious manual work.
AI tools remove drudgery at an unprecedented rate. My favorite new way of creating new projects is 1) create empty directory. 2) point codex at it and give it some example git repos (on disk or by url) and tell it use that repo as a template, copy some features/skills from that repo over there, and then build me an X.
That completely wipes out the drudgery of setting up a new project, fiddling with whatever to get it just right and doing a bunch of work to get some basic mvp in place. You kind of hit the ground running 5 minutes into this. Same with debugging. "CI failed, check what happened and fix it". Or "Follow the release skill and cut a new release". I have a skill dialed in to do a lot of checks around that; it also follows CI to ensure things go ahead as planned. All stuff I used to do manually.
> How does the modern "agent explosion" potentially affect this?
This changes everything. Agents don't really care what versioning software is used. They can probably figure out whatever you are using. But they'll likely assume it's something standard (i.e. Git) so the easiest is to not get too adventurous. Also, the reasons to use something else mostly boil down to user friendliness and new merge strategies. However, lately I just tell codex to pull and deal with merge conflicts. It's not something I have to do manually anymore. That removes a key reason for me to be experimenting with alternative version control systems. It's not that big of a problem anymore.
Git was actually designed for massive teams (the Linux kernel) but you have to be a bit disciplined using it in a way that many users in smaller teams just aren't. With agentic coding tools, you can just codify what you want to happen in guardrails and skills. Including how to deal with version control and what process to follow.
Where more advanced merge strategies could be helpful is the type of large scale refactoring that are now much easier with agentic coding tools. But doing that in repositories with lots of developers working on other changes is not something that should happen very often. And certainly not without a lot of planning and coordination probably.
>Agents don't really care what versioning software is used
Strongly agree that agents don't care about the VCS as they will figure out whatever you throw at them. And you are right about that the merge conflicts are becoming a solved problem when you can just tell an agent to handle it.
But I think there is a much bigger problem emerging that better merge strategies (CRDT or otherwise) do not even touch: the reasoning is gone.
For example the situation taken from the blog is that one side deletes a function while another adds a logging line inside it. The CRDT will give you a better conflict display showing what each side did. Great. But it still doesn't tell you why the function was deleted. Was it deprecated? Moved? Replaced by something else? The reviewer is still reverse-engineering intent from the diff.
This gets/will get much worse with coding agents as agentic commits are orders of magnitude larger, and the commit message barely summarises what happened. An agent might explore three approaches, hit dead ends, flag something as risky, then settle on a solution. All that context vanishes after the session ends.
You are right about codifying guardrails and skills, and I think that is the more productive direction compared to replacing git. We should augment the workflow around it. I also started from a much more radical place, actually, thinking we need to ditch git entirely for agentic workflows [1]. BUT the more I built with agents, the more I realized the pragmatic first step is just preserving the reasoning trail alongside the code, right there in git[2]. No new VCS needed, and the next agent or human that touches the code has the full "WHY" available.
> no one is willing to admit the EV tech isn't just there yet
The easy explanation is that it's because it is there. The article is about the rapid decline of companies that believe otherwise. They aren't doing to great.
The whole point of using a computer is being able to use it. For programmers, that means building software. Which until recently meant having a lot of user land tools available ready to be used by the programmer. Now with agents programming on their behalf, they need full access to all that too in order to do the very valuable and useful things they do. Because they end up needing to do the exact same things you'd do manually.
The current security modes in agents are binary. Super anal about absolutely everything; or off. It's a false choice. It's technically your choice to make and waive their liability (which is why they need you to opt in); but the software is frustrating to use unless you make that choice. So, lots of people make that choice. I'm guilty as well. I could approve every ansible and ssh command manually (yes really). But a typical session where codex follows my guardrails to manage one of my environments using ansible scripts it maintains just involves a whole lot such commands. I feel dirty doing it. But it works so well that doing all that stuff manually is not something I want to go back to.
It's of course insecure as hell and I urgently need something better than yolo mode for this. One of the reasons I like codex is that (so far) it's pretty diligent about instruction following and guard rails. It's what makes me feel slightly more relaxed than I perhaps should be. It could be doing a lot of damage. It just doesn't seem to do that.
reply