this post was submitted on 20 Oct 2025
159 points (98.2% liked)

News

32864 readers
2256 users here now

Welcome to the News community!

Rules:

1. Be civil


Attack the argument, not the person. No racism/sexism/bigotry. Good faith argumentation only. This includes accusing another user of being a bot or paid actor. Trolling is uncivil and is grounds for removal and/or a community ban. Do not respond to rule-breaking content; report it and move on.


2. All posts should contain a source (url) that is as reliable and unbiased as possible and must only contain one link.


Obvious right or left wing sources will be removed at the mods discretion. Supporting links can be added in comments or posted seperately but not to the post body.


3. No bots, spam or self-promotion.


Only approved bots, which follow the guidelines for bots set by the instance, are allowed.


4. Post titles should be the same as the article used as source.


Posts which titles don’t match the source won’t be removed, but the autoMod will notify you, and if your title misrepresents the original article, the post will be deleted. If the site changed their headline, the bot might still contact you, just ignore it, we won’t delete your post.


5. Only recent news is allowed.


Posts must be news from the most recent 30 days.


6. All posts must be news articles.


No opinion pieces, Listicles, editorials or celebrity gossip is allowed. All posts will be judged on a case-by-case basis.


7. No duplicate posts.


If a source you used was already posted by someone else, the autoMod will leave a message. Please remove your post if the autoMod is correct. If the post that matches your post is very old, we refer you to rule 5.


8. Misinformation is prohibited.


Misinformation / propaganda is strictly prohibited. Any comment or post containing or linking to misinformation will be removed. If you feel that your post has been removed in error, credible sources must be provided.


9. No link shorteners.


The auto mod will contact you if a link shortener is detected, please delete your post if they are right.


10. Don't copy entire article in your post body


For copyright reasons, you are not allowed to copy an entire article into your post body. This is an instance wide rule, that is strictly enforced in this community.

founded 2 years ago
MODERATORS
 

An Amazon Web Services outage has been causing major disruptions around the world. The service provides remote computing services to many apps, websites, governments, universities and companies.

On Downdetector, a website that tracks online outages, users reported issues with Amazon Alexa, Amazon Prime, Snapchat, Ring, Roblox, Fortnite, online broker Robinhood, the McDonald's app and many others.

you are viewing a single comment's thread
view the rest of the comments
[–] NuXCOM_90Percent@lemmy.zip 2 points 2 days ago* (last edited 2 days ago) (1 children)

fire your sysadmins and hire DevOps Engineers at 2x the salary

If you aren't managing your own hardware you need far fewer sysadmins.

And while I was fortunate enough to work at a place where the sysadmins understood they were in the service industry, the vast majority of orgs do not have any meaningful communication between the departments which invariably becomes adversarial over time.

DevOps is inherently inefficient because you are paying people to do two jobs (which is why so many companies don't and instead just add more and more responsibilities to the devs who are dumb enough to reveal they have basic linux skills...). But it is also, time and time again, one of, if not THE, most effective ways to actually have "IT" be aware of the needs and use cases of development.

raise a ticket with AWS and wait every time you need more than 5 instances of the same compute type

There is definitely a range where that can bite you and my experience is that the various cloud providers are very good about giving you special service if you are constantly hovering there. But the vast majority of companies either don't need to scale past that or do it "once" during the initial deployment.

pay a premium for the same amount of CPU & RAM you could’ve gotten from your classic VPS provider (...) oops, our biggest DC got knocked offline, here’s some compute time credits

You're paying extra for the stability and uptime as well as the customer service. And, speaking from experience, the vast majority of "traditional" VPS companies "guarantee" Five Nines by having a skeleton crew with a pager app on their phones who may or may not even be awake during their shifts. And the best you get is an acknowledgement and stalling until the main staff come back up.

Skimming down detector? The worst of it was around 0300 east coast time with large mitigations by 0700. It looks like it is spiking again as of 1000 though.

By all means. Rake Bezos's shitty face across the coals and get a massive credit on your bill. But if we are judging a company by their service at their worst? This is NOTHING compared to potentially multi-day outages and needing to manually migrate our own services because "We can't get anyone out to the data center until Wednesday" and so forth.

VPSes are spectacular for hobbyist use and company websites even for places in The City. But if you are providing a nation or even world wide service? You want a proper data center with support staff which more or less means "the cloud". And while I think a LOT of companies should take that into consideration? Pretty much everything on downdetector et al that actually impacts people have very good reasons to not just buy a few nodes and manage it themselves.

[–] despite_velasquez@lemmy.world 3 points 2 days ago (1 children)

I was going to reference this Medium article on how paying extra for “uptime” and reliability isn’t just a 50-100% premium, but many times a 7-8 figure premium. These figures are make or break a business model type figures.

The irony is that Medium, a site hosting mostly static content, is still down due to the AWS outage.

[–] NuXCOM_90Percent@lemmy.zip 2 points 2 days ago (1 children)

I've (presumably) seen that article in the past. It is very much something that every company needs to evaluate for themselves but my experience? That (scaling for company size) premium is usually within discussion range of being worth it. In large part because... finding the kind of staff who gets within even a Three Nines range of uptime is a major undertaking and something you generally only can test when shit is hitting the fan.

So you tend to get analyses both ways. "If everything goes right, X is much cheaper than Y". Which falls apart when you realize that it is someone else's problem to make Y viable and you can always sue the fuck out of them if they screw up badly enough. So it ends up being "Well, our forecasts are that X would cost 4 million a year and Y would cost 6 million a year... but we save N on compensation and we don't have to deal with staffing or HR... Eh, we're probably out a million but our revenue can handle it and then we don't have to deal with it"

[–] despite_velasquez@lemmy.world 1 points 2 days ago* (last edited 2 days ago) (1 children)

I think most companies don’t have a three nines SLA with their customers, yet were sold the idea that cloud (… and then serverless) should be the right decision for them.

When the initial cloud migration happened I’ve seen a handful of startups and scale-ups go bankrupt doing lift and shift

Don’t get me wrong, I agree with what you’re saying, my point is more towards the tribal consensus that was built in the tech community around 2016-2018 that the cloud is the future, for everyone, and that managing your own infrastructure is being a brute

[–] NuXCOM_90Percent@lemmy.zip 1 points 2 days ago* (last edited 2 days ago)

With ANY of the "nines" notation, a good rule of thumb is to move the decimal point 2 or three spots to the left. But it is more the mindset and planning built around that.

For MOST companies and products? "Shit broke, we'll fix it in the morning" is 100% reasonable. But when you are big enough that you are on the front page of downdetector? EVERYONE comes out of the woodwork to insist you are horrible and mismanaged and blahdy blah blah. Which might actually have investor implications.

Which is the other aspect. If I am going to pay a hosting company (with my business hat on), I need some uptime metrcis/guarantees. Violate those and I am expecting compensation. Violate those sufficiently and my bosses are going to have the lawyers see how much of our bad Q2 we can blame on the hosting company. And... there is a lot of value in the department head's responsibility being sending angry emails to Amazon rather than figuring out what employee is getting fired... and if it is them.

But yeah. I saw someone else make the joke of "on -> off -> on -> off" prem cycles but... that is kind of reality.

When you are three people in a garage moonlighting in a way that you can pretend this all started after you all turn in your notice (seriously. One of my favorite goofing off activities is to check the repository of any company that actually has an open source project and laugh at how many MRs and commits were apparently done over the course of a month and TOTALLY weren't rewritten for legal purposes)? Your very initial proof of concept might be a server in a closet but you very rapidly will shift to "the cloud" because you don't have the resources for a full time IT person to even manage the VPS, let alone a rack.

Then, as you get bigger, you hire that sysadmin and either switch to a VPS or on prem to save money. Then you get bigger still and realize that sysadmin's team is as big as engineering and start looking for ways to cut/offload costs... which tends to be The Cloud.

Then you get sufficiently large and have the kinds of customers where data protection is a full time job and start realizing it makes more sense to hire back the two or three competent sysadmins you had and rent some place in a data center. And THEN you get big enough that the entire world notices if you go down for 5 minutes and...

And... yeah. A lot of companies will fail at one of those points. Partially because they don't run the numbers and factor in their runway. But also because those tend to be when work structures are most taxed. A whiteboard where people grab index cards works until you have teams that might not be fully staffed by people with double digit percentages of the company stocks and so forth.