That is the $65k question and unfortunately I don't have a pat answer for that y...

vlovich123 · 2026-03-09T03:33:19 1773027199

Yeah, but that still doesn’t let you see “event A happened before event B which led to C”. I’ve had significantly >> 1 bugs where having good logs lets me investigate and resolve the issue so quickly and easily whereas telemetry would have left you searching around forever.

hinkley · 2026-03-09T05:47:52 1773035272

Here’s the thing though. When you’ve got 1000 req/s split across a couple dozen log files all being scanned in parallel there’s really no such thing as tracing a->b->c anyway. It’s the seashore and you’re looking for a specific shell.

You’ve got correlationids, and if your system isn’t reliably propagating those everywhere you absolutely have to fix that. But you’re going to use those once you already notice an uptick in a weird error you haven’t seen before, and it’s hard to see those when you’re generating 8k log entries per second that are 140-200 characters long and so you’re only seeing twenty of them at a time in Splunk.

You have some chatty frontend that’s firing off three requests at the same time and you’re going to struggle period. You’re going to be down to some janky log searches for that and you don’t need to be paying someone $$ every month to still have it rough.

We used to have QA people for this.

vlovich123 · 2026-03-11T00:34:21 1773189261

But most requests don't generate errors / warnings / failures, so you can easily discard most of the logs for those that don't.

> there’s really no such thing as tracing a->b->c anyway

> and it’s hard to see those when you’re generating 8k log entries per second that are 140-200 characters long and so you’re only seeing twenty of them at a time in Splunk.

Except as you note you can have a tag to correlate logs across distributed services. This is already done for jaeger tracing. It would be insanity to try to look at all logs at once. When you're looking at logs it's because something like "customer A complains they had a problem with request XYZ". And honestly, 8k/s is child's play for logging. A system I was running had to start tuning down the log verbosity at ~30k requests/s and that's because it was generating like 8 logs per request (so ~100k logs/s).

> You’re going to be down to some janky log searches for that and you don’t need to be paying someone $$ every month to still have it rough

That's between you and your log ingestion system. You get to pick where you send your logs and the capabilities it has. All the companies I worked at self-hosted their log infrastructure and it worked fine for not a lot of money. You're conflating best practices with "what can I pay a SaaS company to solve for me". Honeycomb.io may be helpful here btw. Their pricing wasn't exorbitantly egregious here and at low to medium scale tracing the way they do it can supplant the need for logging.