At a very high level, think fruit sorting[0] where the conveyor belt doesn't sto...

sorenjan · 2026-03-04T19:14:12 1772651652

But that's not something you'd use an LLM for. There have been computer vision systems sorting bad peas for more than a decade[0], of course there are plenty of use cases for very fast inspection systems. But when would you use an LLM for anything like that?

[0] https://www.youtube.com/watch?v=eLDxXPziztw

arcanemachiner · 2026-03-04T23:34:07 1772667247

Nobody said you would use an LLM for that. It's an example of a process where "industrial inspection, in particular, [would] benefit from lower latency in exchange for accuracy".

The point of their comment isn't that you would use an LLM to sort fruit. It was just an illustrative example.

sorenjan · 2026-03-05T00:24:25 1772670265

The discussion was about fine-tuned Qwen models, not industrial inspection in general. I would also find it interesting to learn about what kind of edge AI industrial inspection task you could do with fine-tuned llms, not some handwavy answer about how sometimes latency is important in real time systems. Of course it is, so generally you don't use models with several billion parameters unless you need to.

arcanemachiner · 2026-03-05T01:51:51 1772675511

The thread you're in broke away from the main discussion topic.

Again: Nobody is using LLMs to (for example) sort fruit. But there are some industrial processes that prioritize latency over reliability.

nl · 2026-03-05T06:34:26 1772692466

No, we are literally trying to find a use case where using a lower accuracy LLM makes sense for a vision task.

But fine - what are these industrial processes where that prioritize latency over reliability and using a LLM - as mentioned by the OP - makes sense?

IanCal · 2026-03-05T16:27:11 1772728031

> No, we are literally trying to find a use case where using a lower accuracy LLM makes sense for a vision task.

They're reconfigurable on the fly with little technical expertise and without training data, that's really useful. Personally in projects for people I've found models have fewer unusual edge cases than traditional models, are less sensitive to minor changes in input and are easier to debug by asking them what they can see.

sorenjan · 2026-03-05T21:37:25 1772746645

Seems like a way to use a sledgehammer to hammer in screws, and inviting nondeterminism in important systems. Besides being way larger and more complex than what most specialized industrial processes need, they are also vulnerable to adversarial attacks.

https://www.lakera.ai/blog/visual-prompt-injections

https://www.theverge.com/2021/3/8/22319173/openai-machine-vi...

IanCal · 2026-03-06T20:57:47 1772830667

> Seems like a way to use a sledgehammer to hammer in screws

The lazy analogy the other way is that developing a custom system to do these jobs is like hiring a team of experts to spend 2 years designing the perfect crosshead screwdriver that fits exactly one screw (and doesn't work if the screw starts slightly rotated) when you have a flathead one right next to you that'll work and it'll work right now.

> and inviting nondeterminism in important systems.

Traditional ML is just as non-deterministic.

> they are also vulnerable to adversarial attacks.

Typically not relevant in these kinds of cases but also this is easily a problem in many traditional ML algos.

Have you worked on things like this?

sorenjan · 2026-03-08T18:12:36 1772993556

A flathead screwdriver is not a valid analogy, because LLMs are big complicated and opaque machines. And while other ML methods are non-deterministic as well, gaussian process, decision trees or even CNNs are easier to try to make sense of than these huge black boxes.

And I still haven't seen a single example of anyone actually using a finetuned Qwen in industrial inspection, which leads me to believe than nobody is actually using it for that, but some people want to use it because it's their new favorite toy. You don't need a VLM to count cells in microscopy images, or find scratches in painted parts, or estimate output from a log in a saw mill. I can see the use case for things like describing a scene from a surveillance camera, finding a car of a certain model and colour, or other tasks that demand more reasoning or description. But in those cases latency is not super important compared to getting the right output, which was the tradeoff discussed from the start of this thread.

The last thing I'd want to deal with is to have a computer say something like "You're absolutely right, it was wrong of me to classify the metal debris as food".

IanCal · 2026-03-08T22:07:51 1773007671

I’ve used multimodal LLMs for this sort of task and if a fine tuned model would get reasonable performance compared to frontier models I’d use that. Running things purely locally lets you massively simplify the overall architecture and data transfer requirements of some of these tasks if nothing else and lower latency means you can report problems much faster (vs transfer images off device, batch process).

> The last thing I'd want to deal with is to have a computer say something like "You're absolutely right, it was wrong of me to classify the metal debris as food".

The cnn will do that potentially more often and it can be because it’s just not seen enough examples of the debris at that angle or something else equally irrelevant to a human.

0xbadcafebee · 2026-03-04T21:33:00 1772659980

You would use a VLM (vision language model). The model analyzes the image and outputs text, along with general context, that can drive intelligent decisions. https://tryolabs.com/blog/llms-leveraging-computer-vision

embedding-shape · 2026-03-04T18:06:17 1772647577

But why would I want to results to be done faster but less reliable, vs slower and more reliable? Feels like the sort of thing you'd favor accuracy over speed, otherwise you're just degrading the quality control?

bigyabai · 2026-03-04T18:16:40 1772648200

The high-nines of fruit organization are usually not worth running a 400 billion parameter model to catch the last 3 fruit.

CamouflagedKiwi · 2026-03-04T23:12:09 1772665929

It's not that you want it to be faster, but you want the latency to be predictable and reliable, which is much more the case for local inference than sending it away over a network (and especially to the current set of frontier model providers who don't exactly have standout reliability numbers).

embedding-shape · 2026-03-05T12:53:58 1772715238

> which is much more the case for local inference than sending it away over a network

Of course, but that isn't what unclear here.

What's unclear is why a 7b LLM model would be better for those things than say a 14b model, as the difference will be minuscule, yet parent somehow made the claim they make more sense for verification because somehow latency is more important than accuracy.

0cf8612b2e1e · 2026-03-04T18:56:33 1772650593

Local, offline system you control is worth a lot. Introducing an external dependency guarantees you will have downtime outside of your control.

embedding-shape · 2026-03-04T21:14:03 1772658843

Right, but that doesn't answer why you'd need a fast 7b LLM rather than a slightly less fast 14b LLM.

0cf8612b2e1e · 2026-03-04T21:37:54 1772660274

In the hypothetical fruit sorting example, if you have a hard budget of 10 msec to respond and the 7B takes 8 msec and the 14B takes 12msec, there is your imaginary answer. Regular engineering where you have to balance competing constraints instead of running the biggest available.

jwatte · 2026-03-04T23:24:16 1772666656

Hard real time is a thing in some systems. Also, the current approaches might have 85% accuracy -- if the LLM can deliver 90% accuracy while being "less exact" that's still a win!

IanCal · 2026-03-05T16:04:06 1772726646

Can you fit the 14B on the device they're using? That feels rather important.

And then it depends on whether there is a useful difference in performance between the two.

0xbadcafebee · 2026-03-04T21:33:49 1772660029

....because sometimes people need a faster answer? There's many possible reasons someone might need speed over accuracy. In the food sorting example, if lower accuracy means you waste more peanuts, but the speed means you get rid of more bad peanuts overall, then you get fewer complaints about bad peanuts, with a tiny amount of extra material waste.