The economics work if you generate the video locally, using your own compute and a pretrained model provided for a fee. The compute bit is the expensive part. Local users could trade time for money. They just don't have a business or security model that allows them to distribute the model for people to use locally. Sure, you might need to wait all night for 10 seconds of video generated on your 4090, but you could do it, and folks might even pay for the privilege of using the pretrained model. Licensing for local compute might even pay back the cost of training the model with enough time and users.
This is the model that makes sense to me and I'm surprised nobody at OpenAI pursued it. Yeah a 4090 would take hours for 10 seconds of video, but people already do this. The SD/ComfyUI crowd runs overnight batch generations on consumer GPUs and doesn't care about latency.
Charge for model access, let users burn their own power. Basically Llama but for video (pun intended).
The reason it won't come from OpenAI is the deepfake thing. Distribute the weights and you lose all moderation. Sora already had a deepfake disaster WITH server-side controls. Without any? Good luck.
But yeah, for someone willing to go open-weights, there's a real business there.Opus 4.6Étendue
I mean technically that is a system level feature...and there's nothing really wrong with an application adjusting it's own volume as defined by a system level volume setting for that app.
> I was afraid the puzzle-solving was over. But it wasn't—it just moved up a level.
The craft can move up a level too. You still can make decisions about the implementation, which algorithms to use, how to combine them, how and what to test -- essentially crafting the system at a higher level. In a similar sense, we lost the hand-crafting of assembly code as compilers took over, and now we're losing the crafting of classes and algorithms to some extent, but we still craft the system -- what and how it does its thing, and most importantly, why.
Just saying "no" is unclear. LLMs are still very sensitive to prompts. I would recommend being more precise and assuming less as a general rule. Of course you also don't want to be too precise, especially about "how" to do something, which tends to back the LLM into a corner causing bad behavior. Focus on communicating intent clearly in my experience.
Yes, more time on up front spec and plan building. Bite sized specifically to fit within the context window of a single implementation session. Each step should have a verification process that includes new tests.
Prior to each step, I prompt the AI to review the step and ask clarifying questions to fill any missing details. Then implement. Then prompt the AI after to review the changes for any fixes before moving on to the next step. Rinse, repeat.
The specs and plans are actually better for sharing context with the rest of the team than a traditional review process.
I find the code generated by this process to be better in general than the code I've generated over my previous 35+ years of coding. More robust, more complete, better tested. I used to "rush" through this process before, with less upfront planning, and more of a focus on getting a working scaffold up and running as fast as possible, with each step along the way implemented a bit quicker and less robustly, with the assumption I'd return to fix up the corner cases later.
I think that comment is interesting as well. My view is that there is a lot of Electron training code, and that helps in many ways, both in terms of the app architecture, and the specifics of dealing with common problems. Any new architecture would have unknown and unforeseen issues, even for an LLM. The AIs are exceptional at doing stuff that they have been trained on, and even abstracting some of the lessons. The further you deviate away from a standard app, perhaps even a standard CRUD web app, the less the AI knows about how to structure the app.
Thanks for the performance info! More recent Apple chips get much better performance. Also worth trying the Fast quality setting. Great suggestion about default camera positions. I'll add that to the to-do list. Love the idea of a blindfolded chess app with voice control.
reply