Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The key is that you pay for the bandwidth, sampling, cardinality, and schema considerations somewhere in the system. Depending on the problem, you may be able to get away with dealing with those issues later vs. earlier, but at some point you start dealing with required fields, aggregation, etc.

I own a system in one of the tech giants where sampling is counterproductive to our problem domain, and where in the pipeline we deal with each of those issues is our bread and butter problem that can sometimes swing costs by $millions.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: