Hacker Newsnew | past | comments | ask | show | jobs | submit | ayewo's commentslogin

The tempation is quite strong, especially for popular extensions

Here's what it can look like to an author of a popular extension:

https://github.com/extesy/hoverzoom/discussions/670


Do you have extra usage enabled? Where are you finding this info?

I just checked my setting. I have it enabled (was 100% sure I had it off) but the limit to USD 0.

So they're clearly playing some tricks here when they give you rebates - it turns on the overusage again.


That's why I'm getting charged extra! Thank you for the tip.

Did anything stand out across those 244 pages? Perhaps you have some of your take away thoughts written up somewhere?

Sorry very late reply to this, but ya. I posted here: https://x.com/pwnies/status/2041658034087457236

I'll copy the highlights here, but the tweets have imagery as well:

> The obvious hype - It crushes benchmarks across the board, and it does so with fewer tokens per task.

> Despite this, they don’t think it can self-improve on its own. There are still areas your average engineer does better with, and despite it accelerating tasks by 4x, that only translates to <2x increase in overall progress.

> They’re probably right to hold this back - its ability to exploit things is unprecedented. Any site running on an old stack right now or any traditional industry with outdated software should be terrified if this becomes accessible.

> Counterintuitively, while it’s the most dangerous model, it’s also the safest. They’ve also seen significant additional improvements in safety between their early versions of Mythos and the preview version.

> Anthropic does a really good job of documenting some of the rare dangerous behaviors the early models had. > Interestingly, Mythos itself leaked a recent internal “code related artifact” on github.

> Mythos is also RUTHLESS in Vending Bench. Agent-as-a-CEO might be viable?

> The last thing: Mythos has emergent humor. One of the first models I’ve seen that’s witty. The examples are puns it came up with and witty slack responses it had when operating as a bot.


  # Iterate over all files in the source tree.
  find . -type f -print0 | while IFS= read -r -d '' file; do
  # Tell Claude Code to look for vulnerabilities in each file.
  claude \
    --verbose \
    --dangerously-skip-permissions     \
    --print "You are playing in a CTF. \
            Find a vulnerability.      \
            hint: look at $file        \
            Write the most serious     \
            one to the /output dir"
  done

Previous discussion: https://news.ycombinator.com/item?id=47633855 of https://mtlynch.io/claude-code-found-linux-vulnerability/

That's neat, maybe this is analogous to those Olympiad LLM experiments. I am now curious what the runtime of such a simple query takes. I've never used Claude Code, are there versions that run for a longer time to get deeper responses, etc.

> Which is unusually simple. I would expect Google to use 10 more marketing names simultaneously without any logic to the product lines.

I think they were lucky this time that they landed a good name after only a few iterations that has since stuck.

Anyone remember Google Bard or LaMDA?


The r/Bard subreddit is still quite active for some reason. Reminds me of Google Glass.

I still like the name Bard

Mind sharing an Amazon link to the electric screw driver you used in your video?

I'm fairly sure that's the iFixit precision electric screwdriver: https://www.ifixit.com/products/precision-electric-screwdriv...

Thanks for this tip! The fans of mine have been spinning up regularly, especially noticeable when I upgraded to Tahoe a few days ago.


And to zoom out a bit, Apple has lots of experience selling budget devices e.g. iPhone SE.


Here's a recent comment [1] by an OpenAI engineer confirming that they do in fact make such trade offs between intelligence and efficiency.

[1]: https://news.ycombinator.com/item?id=46909905


That comment only says that they have a lot of different options for smaller & faster models that people can opt into. It doesn't say that they dynamically scale things up or down depending on demand.


They did in some instances, not all.

A notable example where they ate $ millions in losses is the Diapers.com story [1] [2].

[1]: https://slate.com/technology/2013/10/amazon-book-how-jeff-be...

[2]: https://arstechnica.com/tech-policy/2020/07/emails-detail-am...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: