How IT Departments Scrambled to Address the CrowdStrike Chaos

Just before 1:00 am local time on Friday, a system administrator for a West Coast company that handles funeral and mortuary services woke up suddenly and noticed his computer screen was aglow. When he checked his company phone, it was exploding with messages about what his colleagues were calling a network issue. Their entire infrastructure was down, threatening to upend funerals and burials.

It soon became clear the massive disruption was caused by the CrowdStrike outage. The security firm accidentally caused chaos around the world on Friday and into the weekend after distributing faulty software to its Falcon monitoring platform, hobbling airlines, hospitals, and other businesses, both small and large.

The administrator, who asked to remain anonymous because he is not authorized to speak publicly about the outage, sprang into action. He ended up working a nearly 20-hour day, driving from mortuary to mortuary and resetting dozens of computers in person to resolve the problem. The situation was urgent, the administrator explains, because the computers needed to be back online so there wouldn’t be disruptions to funeral service scheduling and mortuary communication with hospitals.

“With an issue as extensive as we saw with the CrowdStrike outage, it made sense to make sure that our company was good to go so we can get these families in, so they’re able to go through the services and be with their family members,” the system administrator says. “People are grieving.”

The flawed CrowdStrike update bricked some 8.5 million Windows computers worldwide, sending them into the dreaded Blue Screen of Death (BSOD) spiral. “The confidence we built in drips over the years was lost in buckets within hours, and it was a gut punch,” Shawn Henry, chief security officer of CrowdStrike, wrote on LinkedIn early Monday. “But this pales in comparison to the pain we’ve caused our customers and our partners. We let down the very people we committed to protect.”

Complicating the already nightmarish process were staff shortages. The health care system’s technical staff has been cut in recent years, the system administrator says, and that pushed remaining employees to pull 12- to 14-hour days. If they burn out, there is no one to step in. “We all care about the community that we’re serving—we want to make sure that everything is functioning and that people are being taken care of,” the Maryland administrator says. “But it’s really hard to do that when you don’t have enough staff.”

One chief information security officer of a large health system in the US Midwest emphasized that it isn’t uncommon in health care for budgets to be so tight that organizations have to choose between hiring clinical staff and hiring IT support.

Making everything more difficult were impacted PCs encrypted with the Windows security feature BitLocker. “If you’re using BitLocker, jump off a bridge,” a well-known malware analysis and security news account quipped on X on Friday. In the glitched state, users couldn’t enter the BitLocker keys needed to unlock devices and apply the fix without resorting to complicated workarounds. Microsoft released a recovery tool on Saturday that includes a fix for the issue.

The Midwestern CISO says that even though his employer isn’t a CrowdStrike customer, his team still had to manually address issues on about 120 computers that were running the affected software. But his organization’s biggest disruptions came from partners and other third parties that were directly affected and dealing with outages.

“Medicaid eligibility was down,” he says. “Social Security eligibility was down. Local towns that we do business with were down. And I talked to people at other health systems—this blindsided IT departments to the point that it was all hands on deck. There were nontechnical people running around with USB flash drives doing the fixes at some of America’s largest hospital systems.”

CrowdStrike said on Monday that “a significant number” of the 8.5 million impacted devices “are back online and operational.” And IT professionals tell WIRED, that after a grueling few days, the majority of their organizations’ systems have been restored. But it will take time to reach every machine, everywhere. And the situation has raised deeper questions about how monitoring software is designed and the interconnections of today’s digital systems.

“All it takes is one infrastructure vendor like CrowdStrike,” the health care CISO says. “What’s happened is companies have an emphasis on getting things into production and not taking people off the floor to train them about what to do if something goes down.”

He says that the health care system he works for is back to normal now, but not all of its partners are. “If I’m a CrowdStrike customer,” he says, “the first thing I’m worried about is whether that company is even going to continue to exist in its current form after this.”

Source : Wired