> You will have these problems. Not as often as the larger companies but to imagine that they simply don't exist is the opposite of sound engineering.
A lot of those failure mode examples seem well suited to client-side retries and appropriate rate limiting. If we're talking file uploads then sure, there absolutely are going to be cases where the benefits of having clients go to the third-party is more beneficial than costly (high variance in allowed upload size would be one to consider), but for simple upload cases I'm not so convinced that high-level client retries aren't something that would work.
> if they never got called in the first place due to misconfiguration
I find it hard to believe that having more components to monitor will ever be simpler than fewer. If we're being specific about vendors, the AWS console is IMHO the absolute worst place to go for a good centralized logging experience, so you almost certainly end up shipping your logs into a better centralized logging system that has more useful monitoring and visualisation features than CloudWatch and has the added benefit of not being the AWS console. The cost here? Financial, time, and complexity/moving parts for moving data from one to the other. Oh and don't forget to keep monitoring on the log shipping component too, that can also fail (and needs updates).
> The protocol provided by S3 is available through dozens of vendors.
It's become a de facto standard for sure, and is helpful for other vendors to re-implement it but at varying levels of compatibility.
> It only matters if it is of equivalent or lessor cost.
This is precisely the point, I'm saying that adding boxes in the system diagram is a guaranteed cost as much as a potential benefit.
> Yet you explicitly ignore these
I repeatedly mentioned things that to me count as complexity that should be considered. Additional moving parts/independent components, the associated monitoring required, repository sprawl, etc.
> No, I just read the documentation, and then built it.
I also just 'read the documention and built it', but other comments in the thread allude to vendor-specific training pushing for not only vendor-specific solutions (no surprise) but also the use of vendor-specific technology that maybe wasn't necessary for a reliable system. Why use a simple pull-based API with open standards when you can tie everything up in the world of proprietary vendor solutions that have their own common API?
A lot of those failure mode examples seem well suited to client-side retries and appropriate rate limiting. If we're talking file uploads then sure, there absolutely are going to be cases where the benefits of having clients go to the third-party is more beneficial than costly (high variance in allowed upload size would be one to consider), but for simple upload cases I'm not so convinced that high-level client retries aren't something that would work.
> if they never got called in the first place due to misconfiguration
I find it hard to believe that having more components to monitor will ever be simpler than fewer. If we're being specific about vendors, the AWS console is IMHO the absolute worst place to go for a good centralized logging experience, so you almost certainly end up shipping your logs into a better centralized logging system that has more useful monitoring and visualisation features than CloudWatch and has the added benefit of not being the AWS console. The cost here? Financial, time, and complexity/moving parts for moving data from one to the other. Oh and don't forget to keep monitoring on the log shipping component too, that can also fail (and needs updates).
> The protocol provided by S3 is available through dozens of vendors.
It's become a de facto standard for sure, and is helpful for other vendors to re-implement it but at varying levels of compatibility.
> It only matters if it is of equivalent or lessor cost.
This is precisely the point, I'm saying that adding boxes in the system diagram is a guaranteed cost as much as a potential benefit.
> Yet you explicitly ignore these
I repeatedly mentioned things that to me count as complexity that should be considered. Additional moving parts/independent components, the associated monitoring required, repository sprawl, etc.
> No, I just read the documentation, and then built it.
I also just 'read the documention and built it', but other comments in the thread allude to vendor-specific training pushing for not only vendor-specific solutions (no surprise) but also the use of vendor-specific technology that maybe wasn't necessary for a reliable system. Why use a simple pull-based API with open standards when you can tie everything up in the world of proprietary vendor solutions that have their own common API?