An Alternative Take on the Mozilla Cert Fiasco

The FOSS community is quick to throw shade on anyone who messes up or does something which is perceived to be wrong. More so when that someone is a company. It's a bit sad really, there is no loyalty in this community. You mess up and you might as well pack it in. So when a company like Mozilla messes up, the haters --not the haters specifically of the someone, but the trolls, the people who live and breath for the ability to hate-- come crawling out of the woodwork enmasse.

Hacker News has been on fire with vitriol. I have seen numerous suggestions that the expired certificate which brought down global Firefox add-on usage was a ploy to get more users to enroll in Firefox's testing/analytics program (as those with this option were the first to see a resolution). I have seen comments that suggest Mozilla is done for, that Mozilla is as evil as Google, and that this marks the end of the Firefox brand.

I'm not going to speculate on the fate of the Firefox brand, but I do have an alternative view on Mozilla's expired certificate fiasco. Picture if you will, it's about 30 days prior to the certificate's expiration. A tired sysadmin in Mozilla's NOC gets a warning in Nagios that a cert. is set to expire in 30 days. He makes a ticket for it, sends this off to the owning party, and puts the alert in downtime until the critical threshhold, say 10 days out.

Fast forward 10 days. Different shift, different sysadmin. This guys sees the alert has gone critical. He checks the Nagios history, there's a ticket that it's be acknowledged with. OK, he checks the ticket. Nothing has happened, but it's with the owning party. Cool he doesn't need to worry --it's not uncommon to wait until the 11th hour to roll out the new cert. Meanwhile, the owning party is swamped or has a backlog. An expiring cert is not even on their radar --heck they're not even sysadmin and certs are clearly the job of the systems administrator. The ticket is ignored.

It's UTC 20:00, in four hours the cert will expire. Admin on duty in the NOC sees this alert again. It's about to expire. He checks the history, checks the ticket. No one has touched it. So he checks this knowledgebase, sure enough there are some vague instructions on how to install a new certificate. He gives it a go, but which certificate provider does this cert use? Or maybe it's a matter of needing approval, certs cost money when you're not using Let's Encrypt. Or maybe the instructions are just too vague.

The point is, this sounds to me like just another day in the life of an enterprise. An issue is detected well in advance, the issue is ignored/passed off. It comes back up, but it's already been addressed so it's ignored again. By the time the issue is about to become an incident it's too late, there isn't enough knowledge on hand, or something else is in the way. The FOSS community likes to paint itself as tech-savvy, insiders. But the reactions to this certificate screw-up look more to me like replies from outsiders who have never set food in a datacenter before.

I don't know, but I think the FOSS community needs to show a little more grace and mercy.