More

EmilStenstrom · 2026-03-31T05:45:35 1774935935

I somehow find the concept of a general time series model strange. How can the same model predict egg prices in Italy, and global inflation in a reliable way?

And how would you even use this model, given that there are no explanations that help you trust where the prediction comes from…

teruakohatu · 2026-03-31T05:56:00 1774936560

What is not generally understood is that these models don’t predict egg prices or inflation in Italy.

They decompose a time series into trends, seasonality and residuals. That’s what they are actually modelling.

They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).

jcelerier · 2026-03-31T10:25:13 1774952713

> They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).

well...

morkalork · 2026-03-31T14:00:20 1774965620

Next you'll suggest something looney like a correlation with the 11-year solar cycle!

(for those who are lost: https://x.com/onionweigher/status/1936630237208469898)

guntars · 2026-03-31T12:47:43 1774961263

The Middle East war season is upon us once again

Forgeties79 · 2026-03-31T13:52:40 1774965160

Born too soon to deploy to the Middle East.

Born too late to deploy to the Middle East.

Born just in time to deploy to the Middle East.

lordgrenville · 2026-03-31T08:38:31 1774946311

That's what traditional time-series modelling does. This is a foundational model, which means it's just a neural network trained on lots of time series. (So maybe OP's question still stands? But it's the same question as "how can LLMs be good at so many different kinds of conversations?")

dist-epoch · 2026-03-31T10:59:22 1774954762

Because traditional time-series modelling (ARIMA, GARCH, ...) is too "simple" and "strict". Just like "simple" computer vision (OpenCV, edge-detection, ...) was crushed by neural networks when having to deal with real world images.

robot-wrangler · 2026-03-31T12:03:11 1774958591

This seemed like a good answer at first. But on further thought, images on the whole really do seem to have quite a bit more standard structure / "grammar" to exploit compared to arbitrary time-series. Many images are of the world, where there is gravity so you might see preponderance of blobs at the bottom, or the repetitive types like people, animals, faces, eyes. Wildly abstract images still have some continuity, pixels in a neighborhood are likely to be similar.

Time series in general have none of this kind of structure that's strictly necessary. I'm sure that many real-world sensors typically have some gaussian distribution aspects + noise and/or smoothness and locality types of assumptions that are pretty safe, but presumably that simple stuff is exactly what traditional time-series modelling was exploiting.

Maybe the real question is just what kind of time-series are in the training data, and why do we think whatever implicit structure that is there actually generalizes? I mean, you can see how any training that mixes pictures of dogs and cats with picturing of people could maybe improve drawing hair, detecting hair, or let you draw people AND dogs. It's less clear to me how mixing sensor data / financial data / anything else together could be helpful.

dist-epoch · 2026-03-31T12:30:55 1774960255

> It's less clear to me how mixing sensor data / financial data / anything else together could be helpful.

Because many of these have the same underlying causal structures - humans doing things, weather correlations, holidays.

Well studied behavioral stuff like "the stock market takes the stairs up and the elevator down" which is not really captured by "traditional" modelling tools.

I'm sure people will be doing mechanical interpretation on these models to extract what they pattern match for prediction.

torginus · 2026-03-31T12:46:08 1774961168

Personally, coming from an EE background and not finance or statistics, I would go about identifying these patterns with an Signals & Systems toolbox, like systems identification, various matched filters/classifiers.

This might be a totall wrong approach, but I think it might make sense to try to model a matched filter based on previous stock selloff/bullrun trigger events, and then see if the it has any predictive ability, likewise the market reaction seems to be usually some sort of delayed impulse-like activity, with the whales reacting quickly, and then a distribution of less savvy investors following up the signal with various delays.

I'm sure other smarter people have explored this approach much more in depth before me.

esafak · 2026-03-31T13:10:09 1774962609

You're crafting features. The modern approach to ML (deep learning) is to use over-parameterized models and let them learn the features. Perhaps you remember this? https://www.nytimes.com/2012/06/26/technology/in-a-big-netwo...

srean · 2026-03-31T14:26:48 1774967208

Except that their success in the time series domain has been rather lackluster and elusive. It will s one of the few domains where old school models are not only less work to maintain but also more accurate. There are a few exceptions here and there. Every year there are a few neural nets based challengers. You can follow the M series of computations from its start to see this evolution.

robot-wrangler · 2026-03-31T15:14:29 1774970069

Maybe because useful time-series modeling is usually really about causal modeling? My understanding is that mediated causality in particular is still very difficult, where adding extra hops in the middle takes CoT performance from like 90% to 10%.

srean · 2026-03-31T16:15:40 1774973740

Yes causal models are hard.

NNs do ok on those time series problems where it is really about learning a function directly off time. This is nonlinear regression where time is just another input variable.

Cases where one has to adjust for temporaly correlated errors, those seem to be harder for NNs. BTW I am talking about accuracies beyond what a typical RNN variants will achieve, which is pretty respectable. It's the case that more complicated DNNs don't seem to do much better inspite of their significant model complexity.

orangemaen · 2026-03-31T15:16:20 1774970180

LightGBM won M5 and it wasn't even a competition.

srean · 2026-03-31T16:07:51 1774973271

The task was slightly different and favored GBMs. Note they aren't NNs whose underwhelming performance was what my comment was about.

The M series of competitions change the tasks every year to explore what models perform best under different scenarios. As I mentioned, neural network based models win here and there, but very spotty performance over all.

robot-wrangler · 2026-03-31T12:56:06 1774961766

> Because many of these have the same underlying causal structures - humans doing things, weather correlations, holidays.

Or, you know, maybe they aren't. Thermometers and photon counts are related to weather sometimes, but not holidays. Holidays are related to traffic sensors and to markets, but not Geiger counters.

> Well studied behavioral stuff like "the stock market takes the stairs up and the elevator down" which is not really captured by "traditional" modelling tools.

Prices are the opposite, up like a shot during shocks, falling slowly like a feather. So that particular pattern seems like a great example of over-fitting danger and why you wouldn't expect mixing series of different types to be work very well.

dist-epoch · 2026-03-31T13:59:26 1774965566

Electricity demand is influenced very strongly by holidays, strongly by weather and from weak to strong by geopolitics (depending on location).

The model will have a library of patterns, and will be able to pattern match subtle ones to deduce "this time series has the kind of micro-patterns which appear in strongly weather influenced time-series", and use this to activate the weather pattern cluster.

To use your example, when served thermometer data, the model notices that the holiday pattern cluster doesn't activate/match at all, and will ignore it.

And then it makes sense to train it on the widest possible time series, so it can build a vast library of patterns and find correlations of activation between them.

energy123 · 2026-03-31T18:28:30 1774981710

Sometimes you want inductive bias. No universally true claim can be made like this.

cybrox · 2026-03-31T06:10:55 1774937455

Wars in the middle east seem to have increasingly regular patterns tied to stock market opening hours, unfortunately.

rubyn00bie · 2026-03-31T06:52:24 1774939944

I totally agree with the sentiment but from what I can tell, I’d say they tend happen immediately before or after markets open and close. Essentially, and to their maximum, screwing absolutely everyone who isn’t in the clique from participating in the trade.

FWIW— the only sure fire way to win the trade is to buy time and assume both gross incompetence and negligence when it comes action. The only caveat is if the markets tank enough, this administration will signal capitulation before hand, e.g. Trump mildly capitulating on tariffs last April after the markets proceed to relentlessly defecate themselves.

0-DTE options are typically, and for good reason, stupid gambles. But, right now they can’t even be considered gambling, because there’s zero chance of winning. Not just bad odds, but no odds. Again just signaling how truly malicious this admin is and its disdain for anyone and everyone not close to them.

jofzar · 2026-03-31T07:19:47 1774941587

I mean it's super obvious, it's directly tied to scrubs popularity.

New season of scrubs = new war in the middle east.

FartyMcFarter · 2026-03-31T09:00:27 1774947627

Wow, I didn't know. Thank you! Such a great show.

jofzar · 2026-03-31T12:37:01 1774960621

It's suprisingly good, like it's it's 100% worth watching if you liked scrubs.

perks_12 · 2026-03-31T07:23:24 1774941804

I am not familiar with time series models, but judging from your answer, it would be necessary to feed long time series into this model for it to detect trends. What is a token here? Can it, for the lack of a better example, take in all intraday movements of a stock for a day, a week, a month, etc?

teruakohatu · 2026-03-31T07:40:37 1774942837

I tend to avoid time series forecasting when I can help it because I find it hard to communicate to stakeholders that a neural network (or another method) is not an oracle.

If you are talking about granularity of observations, it would depend on what you are trying to predict (the price in an hour or the price in 12 months?) and how quickly you need the prediction (100ms? Tomorrow morning?). If I had infinite data I would use granularity as a hyper parameter and tune that to a level that produced the best test results.

I am for example currently using weekly averages for non-price data forecasting. I could use daily data but weekly is absolutely adequate for this purpose.

ghywertelling · 2026-03-31T11:18:53 1774955933

You can use lightgbm with appropriate feature engineering.

teruakohatu · 2026-04-01T19:06:48 1775070408

Using many different models, just not NN for this particular application.

amelius · 2026-03-31T09:42:46 1774950166

What makes these models different from models used for e.g. audio?

Or other low-dimensional time domain signals?

carschno · 2026-03-31T12:48:29 1774961309

You could abstract speech or other audio as a series of sounds, where time is indeed a factor. Speech, however, has patterns that are more similar to written language than to seasonal patterns that are typically assumed in time series. While trained on different data, the architecture of TimesFM is actually similar to LLMs. But not identical, as pointed out at https://research.google/blog/a-decoder-only-foundation-model...:

> Firstly, we need a multilayer perceptron block with residual connections to convert a patch of time-series into a token that can be input to the transformer layers along with positional encodings (PE).

> [...]

> Secondly, at the other end, an output token from the stacked transformer can be used to predict a longer length of subsequent time-points than the input patch length, i.e., the output patch length can be larger than the input patch length.

amelius · 2026-03-31T12:51:29 1774961489

If "seasonal patterns" is the thing that differentiates between these two data sources, then perhaps time series models should be called seasonal models?

graemep · 2026-03-31T08:41:01 1774946461

Do these models predict on just a single time series then?

it is far more useful for predictions to look for correlations between time series. This is far more complex than looking for correlations in general because most time series trend up or down and therefore correlate.

ReptileMan · 2026-03-31T07:55:15 1774943715

It is the Middle East. Wars are always in season. And supply is more than the demand.

d--b · 2026-03-31T06:10:33 1774937433

The main issue is that people do use them to predict bitcoin prices intraday and that sort of things.

nico · 2026-03-31T06:28:55 1774938535

Is it an issue because it works, or because it doesn’t? Or because it’s bitcoin?

I genuinely want to know. Thank you

d--b · 2026-03-31T08:04:07 1774944247

It is an issue because bitcoin is highly unpredictable.

These tools are good at predicting timeseries that are in fact quite predictable. Like insurances will use this to estimate the number of people who will die from cancer in the next year, the year after that, and so on up to 50 years in the future. The model will extrapolate the progresses made in cancer treatment from the current trend, etc. It is a prediction, cause it's still possible that a breakthrough comes in and suddenly people don't die from a certain form of cancer, but generally it should be roughly correct.

Bitcoin prices are a lot more chaotic, influenced by a ton of unrelated events that shape its path a certain way. There is absolutely no certainty that studying the shape of its past evolution will help in any way understand its future evolution.

Of course here I mean by studying its price alone. If you add more information, like who's behind each trend and why, you have a much better sense of what could happen next.

visarga · 2026-03-31T06:07:48 1774937268

ARIMA and ARMA models

a-dub · 2026-03-31T14:41:48 1774968108

ar(k) stuff, sure. that's old news. i would expect the newfangled stuff to be good at 0-shot learning of pre-event signatures spread across multiple series, at a minimum.

lovelearning · 2026-03-31T06:12:36 1774937556

My understanding is that the synthetic training data helps capture abstract time-series patterns that are common in all domains.

As they say in appendix 8:

> We create the synthetic data to reflect common time-series patterns using traditional statistical models. We start with four simple times series patterns:

> • Piece-wise linear trends (I), where the number of the piece-wise linear components is randomly chosen between 2 and 8.

> • ARMA(p, q) (II), where 1 ≤ p, q ≤ 8 and the corresponding coefficients are generated from either a multivariate Gaussian or a uniform, then normalized.

> • Seasonal patterns. In particular we create the sine (III) and the cosine (IV) waves of different random periods between 4 and max context length / 2 time-points and time delays.

If there were no such underlying patterns in the class of all time-series data, then even the idea of traditional time-series models would be fundamentally misplaced.

And since this is a transformer model, it also looks for patterns in the problem-specific input data at inference time, just like how the input context to an LLM influences its output's relevance.

strongpigeon · 2026-03-31T16:20:31 1774974031

When I worked on Google Ads, we used time series forecasting to compute the odds of an ad campaign reaching its goal (and to tell users how likely they were to hit them).

A ton of (unsophisticated) advertisers would just draw a line from zero to the number they are at today and project that line to the end of the month to forecast the amount of conversions/spend they were going to hit. This of course doesn't take into account various seasonalities (day-of-week, time-of-year, etc.) and gives you a pretty poor forecast. Compared to those, time-series forecasting is much more accurate.

Is it perfectly accurate? No, that's impossible. But when you can train a model on all advertising campaigns, you can give good 95% confidence intervals.

thesz · 2026-03-31T11:54:26 1774958066

  > How can the same model predict egg prices in Italy, and global inflation in a reliable way?

For one, there's Benford's law: https://en.wikipedia.org/wiki/Benford%27s_law

So, predict sign (branch predictors in modern CPUs also use neural networks of sorts), exponent (most probably it changes slowly) and then predict mantissa using Benford's law.

benob · 2026-03-31T06:03:33 1774937013

I would say:

- decomposition: discover a more general form of Fourrier transform to untangle the underlying factors

- memorization: some patterns are recurrent in many domains such as power low

- multitask: exploit cross-domain connections such as weather vs electricity

eru · 2026-03-31T06:50:47 1774939847

> How can the same model predict egg prices in Italy, and global inflation in a reliable way?

How can the same lossy compression algorithm (eg JPG) compress pictures of everything in a reliable way?

cenamus · 2026-03-31T06:53:33 1774940013

It can't compress pictures of everything in a reliable way.

Text and anything with lots of high frequency components looks terrible

eru · 2026-03-31T08:20:06 1774945206

It still doesn't pretty well on text. And we have newer formats and ideas that would also deal with that. (To be really dead simple: have a minimal container format that decides between png or jpg, use png for text.)

However: white noise is where it really struggles. But real pictures of the real world don't look like white noise. Even though in some sense white noise is the most common type of picture a priori.

Similar for real world time series: reality mostly doesn't look like white noise.

FartyMcFarter · 2026-03-31T09:07:28 1774948048

White noise is random, so it's incompressible by definition. By JPG or by any other method no matter how clever.

eru · 2026-03-31T09:55:30 1774950930

I have a very peculiar coin. With 1% probability it turns up heads and with 99% probability it turns up tails.

A string of flips is random, but it's very compressible.

In any case, my point was that reality ain't uniformly random. And not only that: pretty much anything you can point your camera at shares enough similarity in their distribution that we pretty much have universal compression algorithms for real world data.

hamdingers · 2026-03-31T17:31:10 1774978270

What you're saying is only true for lossless compression, if you're fine discarding data you can compress anything. Try it yourself:

    magick -size 512x512 xc:gray +noise Random noise.png
    magick noise.png -interlace Plane -quality 75 compressed_noise.jpg

Result is ~380k smaller and doesn't look much different at 100%.

eru · 2026-04-01T03:14:18 1775013258

You are right, but that says more about human perception than about the input data.

at_compile_time · 2026-03-31T07:08:15 1774940895

Reliably terrible.

JackeJR · 2026-03-31T10:41:18 1774953678

Actually it can. See https://youtu.be/FUQwijSDzg8?si=LWd5gVNYRd3HH9rJ

Or just search for the James-Stein paradox.

ludicrousdispla · 2026-03-31T14:58:17 1774969097

It's best to think of it as a giant tree, from which you can pick cherries.

nurettin · 2026-04-01T04:42:11 1775018531

> predict egg prices in Italy, and global inflation in a reliable way?

Easy, both go up.

samuelknight · 2026-03-31T14:26:47 1774967207

I think that a model designed to ignore semantic chatter like financial news and deeply inspect the raw data is a very powerful perspective.

annie511266728 · 2026-03-31T08:14:06 1774944846

It’s not really predicting “egg prices” or “inflation” — it’s mostly fitting patterns that happen to show up in those series.

The problem isn’t domain generalization, it’s that we keep pretending these models have any notion of what the data means.

People ask how one model can understand everything, but that assumes there’s any understanding involved at all.

At some point you have to ask: how much of “forecasting” is actually anything more than curve fitting with better marketing?

fjdjshsh · 2026-03-31T10:14:28 1774952068

"curve-fitting" has a long history (centuries old) and could be regarded more as a numerical method issue.

Rigorous understanding of what is over fitting, techniques to avoid it and select the right complexity of the model, etc, are much newer. This is a statistical issue.

My point is that forecasting isn't curve fitting, even thought curve fitting is one element of it.

nairadithya · 2026-04-01T01:30:13 1775007013

I don't know how I feel about LLM slop coming to HN.

EmilStenstrom · 2026-03-31T05:41:17 1774935677

Here is the link to the blogpost, that actually describe what this is: https://github.com/google-research/timesfm?tab=readme-ov-fil...

nels · 2026-03-31T05:56:17 1774936577

I think you meant to link this page: https://research.google/blog/a-decoder-only-foundation-model...

OliverGuy · 2026-03-31T07:14:27 1774941267

Wish they gave some numbers for total GPU hours to train this model, seems comparatively tiny when compared to LLMs so interested to know how close this is to something trainable by your average hobbyist/university/small lab

OliverGuy · 2026-03-31T07:27:41 1774942061

Edit, it looks like the paper does

TPUv5e with 16 tensor cores for 2 days for the 200M param model.

Claude reckons this is 60 hours on a 8xA100 rig, so very accessibile compared to LLMs for smaller labs

refulgentis · 2026-03-31T05:53:19 1774936399

That takes me to the same content as the submission, a GitHub repo (Chrome on iOS)

rockwotj · 2026-03-31T05:56:31 1774936591

Probably the better link: https://research.google/blog/a-decoder-only-foundation-model...

akshayshah · 2026-03-31T05:57:45 1774936665

And https://arxiv.org/pdf/2310.10688 if you want the full paper.

Cyuonut · 2026-03-31T05:57:13 1774936633

I suppose they tried to link this: https://research.google/blog/a-decoder-only-foundation-model...

EmilStenstrom · 2026-03-15T10:30:07 1773570607

"Simply put: It’s a big mess, and no off-the-shelf accounting software does what I need. So after years of pain, I finally sat down last week and started to build my own. It took me about five days. I am now using the best piece of accounting software I’ve ever used."

EmilStenstrom · 2026-02-05T20:45:43 1770324343

Doesn't matter which one. All of them can do things like this now, given a good enough feedback loop. Which your problem has.

EmilStenstrom · 2026-01-20T23:08:24 1768950504

Added features since initial release:

- Bleach-like sanitization feature built in and enabled by default

- Transforms API for simple HTML mutations

- Rewamped docs

- Playground powered by PyOdide (thanks for the idea simomw!)

EmilStenstrom · 2026-01-17T00:19:03 1768609143

I see that the blog post is mentioning not finding the web100k dataset, it's here: https://github.com/EmilStenstrom/web100k

putlake · 2026-01-17T04:02:40 1768622560

Thanks, Emil! I've updated the post with this link.

EmilStenstrom · 2026-01-17T00:05:52 1768608352

As the author, it's a stretch to say that JustHTML is a port of html5ever. While you're right that this was part of the initial prompt, the code is very different, which is typically not what counts as "port". Your mileage may wary.

EmilStenstrom · 2026-01-11T17:45:08 1768153508

To see the actual errors, just paste your HTML here and see: https://emilstenstrom.github.io/justhtml/playground/ - any parsing errors show up below the input box.

Some tags do require ending tags, others do not. Personally I find it hard to remember which ones, so I just close things out of caution. That way you’re always spec-correct.

EmilStenstrom · 2026-01-02T18:33:22 1767378802

Data driven test suites are really good for building trust in a library. Both the html5lib-tests suite and my recent xss-bench are examples of this!

EmilStenstrom · 2025-12-31T00:11:01 1767139861

The reason for this was to be able to build trust in the new sanitization features of my other project: https://friendlybit.com/python/justhtml-sanitization/