More

pants2 · 2026-04-04T15:39:11 1775317151

So what happens if the author ignores this judgement? Surely arbitration can't send someone to prison. According to the web they still need a court to even confirm a monetary penalty.

ceejayoz · 2026-04-04T15:43:43 1775317423

Per the article:

"facing fines of $50,000 for every statement that could be seen to be “negative or otherwise detrimental” to Meta"

> According to the web they still need a court to even confirm a monetary penalty.

No, not necessarily with arbitration. The judgement itself may need to be confirmed in some states; it likely already has.

pants2 · 2026-04-03T21:28:46 1775251726

While the D5 is a great camera it's ~10 years old. Wonder why they didn't go for the Z9 which is its modern mirrorless equivalent.

jimbosis · 2026-04-03T21:37:48 1775252268

"The Nikon D5 remains the camera of choice for the Artemis II mission and will be assigned primary photographic duties. It is a proven, highly-tested camera that the Artemis II team knows will excel in the high-radiation environment of space. However, as Artemis II Commander Reid Wiseman explained ahead of yesterday’s launch, he successfully fought to have a single Nikon Z9 added to Artemis II’s manifest."

https://petapixel.com/2026/04/02/a-nikon-z9-made-it-aboard-t...

There are more interesting details in the PetaPixel article, such as: "'That’s the camera that they’ll be using, the crew will be using on Artemis III plus, so we were fighting really hard to get that on the vehicle to test out in a high-radiation environment in deep space,' Wiseman said."

H/t to "SiliconEagle73" who linked to that PetaPixel article in the thread linked below.

https://old.reddit.com/r/nasa/comments/1sbfevm/new_high_reso...

zimpenfish · 2026-04-03T21:33:35 1775252015

> Wonder why they didn't go for the Z9 which is its modern mirrorless equivalent.

From [0], "The D5 was chosen for its radiation resistance, extreme ISO range (up to 3,280,000), and proven reliability in space." (

[0] https://www.photoworkout.com/artemis-ii-nikon-d5-moon/

porphyra · 2026-04-03T21:59:44 1775253584

They did bring the Z9: https://petapixel.com/2026/04/02/a-nikon-z9-made-it-aboard-t...

But yeah the grainy photo of the Earth with the D5 at ISO 51200 shows the shortcomings of the ancient DSLR. Still, great shot.

hypercube33 · 2026-04-03T23:44:51 1775259891

I'd argue the D4s and D5 may be some of the best high ISO cameras I'm aware of maybe surpassed by that one canon video camera that can seemingly see in the dark (sorry I'm mobile) and the D3s. I think the lower numbers produce nicer looking max ISO noise but that's all preference. Sony has the A7s as well but as with some of these the overall resolution isn't extreme.

jeffreygoesto · 2026-04-04T09:19:37 1775294377

How does the age of the camera influence physics? The only thing that really helps would be increasing the aperture.

pants2 · 2026-04-04T16:29:51 1775320191

Lower noise sensors and better image stabilization for longer exposures

miladyincontrol · 2026-04-04T15:31:56 1775316716

From what I recall reading its more or less, "we have established and validated processes for using the D5." Its less about getting the best possible photo, more about making sure what they do take looks fine and doesnt waste a ton of time.

ericcumbee · 2026-04-04T22:13:50 1775340830

The D5 has flight heritage to use the industry term.

apitman · 2026-04-04T03:56:22 1775274982

It might be the newest thing on the ship.

loloquwowndueo · 2026-04-03T21:34:23 1775252063

Zero point in measuring camera sizes (or other sizes haha) when JWST is floating there.

reactordev · 2026-04-03T21:31:10 1775251870

Government budgets man…

pants2 · 2026-04-03T16:46:56 1775234816

I think it's gorgeous and reminiscent of the Sagrada Familia

pants2 · 2026-04-02T19:17:26 1775157446

Yes, you've described Tailscale + Exit Nodes + Tailnet that you invite your family to. Install Tailscale and enable some devices as exit nodes - it's pretty much as simple as that.

pants2 · 2026-04-02T17:55:42 1775152542

Doesn't this just look like another case of "count the r's in strawberry" ie not understanding how tokenization works?

This is well known and not that interesting to me - ask the model to use python to solve any of these questions and it will get it right every time.

graemefawcett · 2026-04-02T18:53:28 1775156008

It's not just an issue of tokenization, it's almost a category error. Lisp, accounting and the number of r's in strawberry are all operations that require state. Balancing ((your)((lisp)(parens))) requires a stack, count r's in strawberry requires a register, counting to 5 requires an accumulator to hold 4.

An LLM is a router and completely stateless aside from the context you feed into it. Attention is just routing the probability distribution of the next token, and I'm not sure that's going to accumulate much in a single pass.

BoredomIsFun · 2026-04-03T04:45:21 1775191521

> An LLM is a router and completely stateless aside from the context you feed into it.

Not the latest SSM and hybrid attention ones.

graemefawcett · 2026-04-04T04:40:50 1775277650

Stateless router to router with lossy scratchpad is a step up, still not going to ask it to check my Lisp. That's what linters are for

wahnfrieden · 2026-04-02T17:57:25 1775152645

It's not dismissible as a misunderstanding of tokens. LLMs also embed knowledge of spelling - that's how they fixed the strawberry issue. It's a valid criticism and evaluation.

Lerc · 2026-04-02T18:33:36 1775154816

The r's in strawberry presents a different level of task to what people imagine. It seems trivial to a naive observer because the answer is easily derivable from the question without extra knowledge.

A more accurate analogy for humans would be to imagine if every word had a colour. You are told that there are also a sequence of different colours that correspond to the same colour as that word. You are even given a book showing every combination to memorise.

You learn the colours well enough that you can read and write coherently using them.

Then comes the question of how many chocolate-browns are in teal-with-a-hint-of-red. You know that teal-with-a-hint-of-red is a fruit and you know that the colour can also be constructed by crimson followed by Disney-blond. Now, do both of those contain chocolate-brown or just one of them, how many?

It requires excersizing memory to do a task that is underrepresented in the training data because humans simply do not have to do the task at all when the answer can be derived from the question representation. Humans also don't have the ability that the LLMs need but the letter representation doesn't need that ability.

wahnfrieden · 2026-04-02T19:19:51 1775157591

That’s what makes it a fair evaluation and something that requires improvement. We shouldn’t only evaluate agent skills by what is most commonly represented in training data. We expect performance from them on areas that existing training data may be deficient at providing. You don’t need to invent an absurdity to find these cases.

Lerc · 2026-04-02T21:25:08 1775165108

It's reasonable to test their ability to do this, and it's worth working to make it better.

The issue is that people claim the performance is representative of a human's performance in the same situation. That gives an incorrect overall estimation of ability.

azakai · 2026-04-02T18:25:37 1775154337

I do think this is a tool issue. Here is what the article says:

> For the multiplication task, note that agents that make external calls to a calculator tool may have ZEH = ∞. While ZEH = ∞ does have meaning, in this paper we primarily evaluate the LLM itself without external tool calls

The models can count to infinity if you give them access to tools. The production models do this.

Not that the paper is wrong, it is still interesting to measure the core neural network of a model. But modern models use tools.

irishcoffee · 2026-04-03T13:30:36 1775223036

So, the tools can count then?

Humans can fly, they just need wings!

azakai · 2026-04-03T14:56:35 1775228195

It is academically interesting what pure neural networks can do, of course. But when someone goes to Claude and tries to do something, they don't care if it solves the problem using a neural network or a call out to Python. So long as the result is right.

More generally, the ability to use tools is a form of intelligence, just like when humans and crows do it. Being able to craft the right Python script and use the result is non-trivial.

cr125rider · 2026-04-02T18:17:48 1775153868

Seems like it’s maybe also a tool steering problem. These models should be reaching for tools to help solve factual problems. LLM should stick to prose.

emp17344 · 2026-04-02T18:23:41 1775154221

I think this is still useful research that calls into question how “smart” these models are. If the model needs a separate tool to solve a problem, has the model really solved the problem, or just outsourced it to a harness that it’s been trained - via reinforcement learning - to call upon?

dghlsakjg · 2026-04-02T22:22:46 1775168566

Does it matter if the LLM can solve the problem or if it knows to use a resource?

There’s plenty of math that I couldn’t even begin to solve without a calculator or other tool. Doesn’t mean I’m not solving math problems.

In woodworking, the advice is to let the tool do the work. Does someone using a power saw have less claim to having built something than a handsaw user? Does a CNC user not count as a woodworker because the machine is doing the part that would be hard or impossible for a human?

grey-area · 2026-04-03T04:13:43 1775189623

It does matter because the LLM doesn’t always know when to use tools (e.g. ask it for sales projections which are similar to something in its weights) and is unable to reason about the boundaries of its knowledge.

hooverd · 2026-04-06T15:39:49 1775489989

Is your issue with math in this example the tediousness of the operations or a conceptual lack of understanding of how to solve them?

azakai · 2026-04-02T18:27:52 1775154472

It has "outsourced" it to another component, sure, but does that matter?

What the user sees is the total behavior of the entire system, not whether the system has internal divisions and separations.

emp17344 · 2026-04-02T18:36:54 1775155014

It matters if you’re curious about whether AGI is possible. Have we really built “thinking machines”, or are these systems just elaborate harnesses that leverage the non-deterministic nature of LLMs?

azakai · 2026-04-02T19:16:56 1775157416

An "elaborate harness" that can break down a problem into sub-tasks, write Python scripts for the ones it can't solve itself, and then combine the results, seems able to solve a wide range of cognitive tasks?

At least in theory.

TeMPOraL · 2026-04-02T20:33:26 1775162006

What is a difference? If the "elaborate harness" consists of mix of "classical" code and ML model invocations, at which point it's disqualified from consideration for "thinking machine"? Best we can tell, even our brains have parts that are "dumb", interfacing with the parts that we consider "where the magic happens".

stratos123 · 2026-04-02T19:03:11 1775156591

Are you still talking about this paper? No tools were allowed in it.

pants2 · 2026-04-02T14:46:38 1775141198

Not really - I use Brave browser on iPhone, a simple app install, and it blocks ads extremely well, even on YouTube and Instagram.

pants2 · 2026-04-01T20:04:35 1775073875

1. Most AI datacenter plans and valuation are not tied to subscriptions, but from a more vague promise of "AGI," so this isn't likely to pop the bubble IMO (even if it does happen)

2. Historical precedent holds that governments are more likely to suppress rates to spur the economy during wartime.

pants2 · 2026-03-30T05:28:07 1774848487

Was Raycast bought by GitHub or something? Why would it be advertising for Raycast?

Brought to you by Wendy's.

efreak · 2026-03-30T17:11:29 1774890689

Presumably you need to pay raycast once for a setup operation while you need to pay constantly for copilot. Why wouldn't you advertise for someone who makes you more money at the same time as advertising for yourself?

pants2 · 2026-03-27T19:23:58 1774639438

This is super cool, and fully agreed that dark patterns / performance issues in TurboTax are frustrating. That said, I'm probably not ready to delegate something that sensitive to AI.

What I'd love to see here is if you actually do use TurboTax, how does your final tax return differ from the vibe-coded one?

pants2 · 2026-03-27T16:09:27 1774627767

Cool project!