Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is tough, though, for me to fully buy labor statistics when it has become the norm recently for them to be revised down. This spans back into Biden's term as well so it isn't one party either.

With a valid measure I would expect a roughly even distribution over time between underestimates and overestimates. For a valid measure worth considering I'd also expect the stat to be released later when revisions are less likely because more actual data has been collected



> With a valid measure I would expect a roughly even distribution over time between underestimates and overestimates

This is a valid hypothesis. It’s wrong, and I’ll explain why. (It’s a bad and invalid thing to conclude.)

If measurement errors were iid, you’d be correct. But they’re not. They’re well documented for not being so. Earlier survey results are biased by directional response bias inasmuch as the employers with the lease changes respond first. So the earliest releases tend to match whatever was going on before. Then the employers who had to do paperwork respond. And then, finally, someone gets around to calling the folks who never got back. Some of them aren’t around anymore.

So yeah, the directional tendency in revisions is well documented. And for a long time, the early releases were appreciated. But maybe American statistical and media literacy is such that only final releases should be released, which would mean we’d always be working with data 6 months to a year out of date.


That's all well and good in theory, but job reports data over recent years have noticeably shifted towards downward monthly revisions. Prior to the pandemic response, the graph [1] looks much more balanced with regards to positive and negative decisions.

[1] https://www.apmresearchlab.org/blog/how-abnormal-are-the-rev...


Sure, but it's totally ridiculous to post about that without discussing the survey response rate, which is the cause of that drift. People are attributing it to political meddling, and that is baseless.

Naturally all of this metadata about the BLS surveys is available for free from the BLS, so you can just go look at it.


Interesting that you're claiming this is baseless without providing any sources for your alternative. How do you know that (a) the response rate is down meaningfully and (b) that data shows a strong correlation or causation between the two?


> but job reports data over recent years have noticeably shifted towards downward monthly revisions. Prior to the pandemic response, the graph [1] looks much more balanced with regards to positive and negative decisions

Yes. The reasons for this are well documented. Changing methodology for the preview estimates is rigorous. That means our published estimates lag best estimates, something the primary sources note in every release if one gets past the headlines.

Also, if you have one year of massive job gains and four years of flat and falling, you’ll spend most of your epoch biased one way. Again, not a sign of methodological problems. Just a predictable methodological artifact that folks are supposed to be able to incorporate before using, much less emotionally reacting to, the data.


Why would the shift to a new methodology bias the estimates to one end? I would expect a new methodology to make comparisons of data between the two systems to potentially be unhelpful, but I wouldn't expect a valid methodology to bias one way or another.

Related, I wouldn't expect past data to bias a current estimate. If 6 or 12 months of positive growth biases the next prediction it falls into the hot hands fallacy. It isn't predicting based on current predictions, its predicting based on recent past behavior and extrapolating forward. This only makes sense to do if the data is not yet available, and even then the extrapolation isn't a useful estimate of current conditions.


> If 6 or 12 months of positive growth biases the next prediction it falls into the hot hands fallacy

It’s a sample of a sample. The full sample is the final release. The early results are the preliminary releases. When firms change things they take longer to respond. So whichever way the economy is moving, there will be bias in that direction. If the economy is turning, you won’t know direction. If it’s accelerating or slowing down you don’t know magnitude. Sometimes context clues can help. Sometimes they can’t. There is no known statistical treatment for intuiting the missing data before one has it11


We agree here, and I am going a step further saying that the initial numbers are useless and are little more than throwing opinionated darts. Numbers shouldn't be released until they meet some reasonable level of response and statistical validity. Given that they do release numbers today, I judge them as early and either inaccurate and useless or politically motivated to push markets while there's no meaningful data to contradict them.


> the initial numbers are useless and are little more than throwing opinionated darts

You’re still concluding from ignorance. They are not. A better question would be ask to whom they’re useful and how.

Like, if a fire is burning in a neighborhood, every sighting is valuable. You don’t always need to wait for a comprehensive picture before being able to do anything.

> I judge them as early and either inaccurate and useless or politically motivated to push markets while there's no meaningful data to contradict them

That’s wrong. But it seems to be a common error.

Maybe the solution is to make these numbers available only to gatekeep these numbers. Policymakers, academics, enterprises and banks can get a rarefied sheet for a fee. But the public doesn’t get PDFs, much less public reporting.

> while there's no meaningful data to contradict them

There are bajillions of them. ADP. State reports. Private surveys. Fed studies. That said, I’m leaning towards your view—maybe these data aren’t best made broadly public.


> Like, if a fire is burning in a neighborhood, every sighting is valuable. You don’t always need to wait for a comprehensive picture before being able to do anything.

That assumes there are a meaningful number of reliable reports. If I regularly am told there is a fire only to have authorities come back a week later to adjust reports down I wouldn't trust them. If they over estimated the number of fires based on the last 6-12 months of fire data, with little recent data to go on otherwise, I would ignore the reports.

> Maybe the solution is to make these numbers available only to gatekeep these numbers.

This seems more reasonable at least, though I don't see much use in the data still when its released so early that its based primarily on recent historic trends and few survey responses.

> There are bajillions of them.

Those are all anecdotal in this case. If said sources were applicable and reliable the official data would consider those and have more accurate reporting. My point was that said reports depend on survey results, and when they report early results so early that few responses are in yet them there is no official data to contradict the early reporting.


That is a reasonable position, however the assumption that it is the administration that is gaming them vs other motivated parties is open for discussion.


It is in fact not at all reasonable. They are saying that the BLS stats can't be trusted because they totally misunderstand the survey methodology. That isn't a reason!


I’d counter that if we were doing a good job gathering data that these structural biases could be compensated for with more conservative initial numbers.

At some point a lack of decision to take compensating action becomes faking the numbers.


> if we were doing a good job gathering data that these structural biases could be compensated for with more conservative initial numbers

There is no more conservative. The data will bias in the direction of trend. The point of the data are, in part, to measure that trend. Fucking with it to make it politically correct to the statistically illiterate is precisely the sort of degradation of data we’re worried about.

(They’re also useless as a time series if the methodology changes quarter to quarter. That’s the job of analysis. Not the data.)


What you wrote suggests the data will bias predictably, which matches my understanding.

Reporting biased data as the default because the bias compensation is already built into the audience seems like a weak argument for not improving.

They can provide for the continuation of data visibility/granularity by releasing the prior numbers as previously calculated and at the same time changing the calculation of the headline number to be better compensated.

The simpler argument is that changing it at all will result in a negative step change in the reporting that no one wants to take accountability for.


> What you wrote suggests the data will bias predictably

Ex post facto. Before the fact, we don’t know.

Imagine you know the weather will be a strong gust regardless of direction. Averaging the models will produce a central estimate. But you know it will be biased away from the center. You just don’t know, until it happens, in which direction.

> They can provide for the continuation of data visibility/granularity by releasing the prior numbers as previously calculated and at the same time changing the calculation of the headline number to be better compensated

They do. These data are all recalculated with each methodological change. They’re just deprecated indices the media don’t report on because they’re of academic, not broad, concern.

> simpler argument is that changing it at all will result in a negative step change in the reporting

Simpler but wrong. Those data would be useless for the same reason we don’t let CEOs smooth revenues.


I’m confused by this discussion. It seems like you said the biases were structural because we know who reports early and that is why the early numbers are always revised down. Structural implies known in advance.

It also seems like you said they shouldn’t revise the numbers but now you are saying they already do.

What am I misunderstanding?


> It is tough, though, for me to fully buy labor statistics when it has become the norm recently for them to be revised down.

There have been revisions since the forever, and this is because they depend in part of surveys, and if companies (and the people with-in them) don't bother responding in a timely or accurate manner then that's going to throw the sampling off.

> CES estimates are considered preliminary when first published each month because not all respondents report their payroll data by the initial release of employment, hours, and earnings. BLS continues to collect payroll data and revises estimates twice before the annual benchmark update (see benchmark revisions section below).

* https://www.bls.gov/opub/hom/ces/presentation.htm#revisions

Post-COVID surveying seems to have become more difficult (and BLS budget stagnation/cuts haven't helped). This has been a known issue for a while; see Odd Lots episode "Some of America's Most Important Economic Data Is Decaying":

> Gathering official economic data is a huge process in the best of times. But a bunch of different things have now combined to make that process even harder. People aren't responding to surveys like they used to. Survey responses have also become a lot more divided along political lines. And at the same time, the Trump administration wants to cut back on government spending, and the worry is that fewer official resources will make tracking the US economy even harder for statistical departments that were already stretched. Bill Beach was commissioner of labor statistics and head of the US Bureau of Labor Statistics during Trump's first presidency and also during President Biden's. On this episode, we talk to him about the importance of official data and why the rails for economic data are deteriorating so quickly.

* https://www.youtube.com/watch?v=nfgpqVixeIw


My argument wasn't that there shouldn't be revisions though, only that recent years have shown consistent negative revisions rather then a roughly even distribution.

If response rates are down or something else is making surveys more difficult, its reasonable that confidence windows would weaken and size of revisions would increase. Its unreasonable that difficulty in surveying would lead to a consistent bias in results though, that's a methodological issue at best.


> My argument wasn't that there shouldn't be revisions though, only that recent years have shown consistent negative revisions rather then a roughly even distribution.

It's been to too many moons since I took a prob/stats course to comment accurately on population sampling, but how valid is the assumption that errors 'should' skew both positive and negative?


If errors are skewed in one direction there would likely have to be a factor forcing it, like sampling and response bias.

That's always possible, though again I question the validity of the measure and results if its getting consistently skewed results. Either the methodology is faulty or the results simply can't be trusted because they can't reliably get good data.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: