How IT Departments Scrambled to Address the CrowdStrike Chaos

24 Luglio 2024

Just before 1:00 am local time on Friday, a system administrator for a West Coast company that handles funeral and mortuary services woke up suddenly and noticed his computer screen was aglow. When he checked his company phone, it was exploding with messages about what his colleagues were calling a network issue. Their entire infrastructure was down, threatening to upend funerals and burials.

It soon became clear the massive disruption was caused by the CrowdStrike outage. The security firm accidentally caused chaos around the world on Friday and into the weekend after distributing faulty software to its Falcon monitoring platform, hobbling airlines, hospitals, and other businesses, both small and large.

The administrator, who asked to remain anonymous because he is not authorized to speak publicly about the outage, sprang into action. He ended up working a nearly 20-hour day, driving from mortuary to mortuary and resetting dozens of computers in person to resolve the problem. The situation was urgent, the administrator explains, because the computers needed to be back online so there wouldn’t be disruptions to funeral service scheduling and mortuary communication with hospitals.

“With an issue as extensive as we saw with the CrowdStrike outage, it made sense to make sure that our company was good to go so we can get these families in, so they’re able to go through the services and be with their family members,” the system administrator says. “People are grieving.”

The flawed CrowdStrike update bricked some 8.5 million Windows computers worldwide, sending them into the dreaded Blue Screen of Death (BSOD) spiral. “The confidence we built in drips over the years was lost in buckets within hours, and it was a gut punch,” Shawn Henry, chief security officer of CrowdStrike, wrote on LinkedIn early Monday. “But this pales in comparison to the pain we’ve caused our customers and our partners. We let down the very people we committed to protect.”

Cloud platform outages and other software issues—including malicious cyberattacks—have caused major IT outages and global disruption before. But last week’s incident was particularly noteworthy for two reasons. First, it stemmed from a mistake in software meant to aid and defend networks, not harm them. And second, resolving the issue required hands-on access to each affected machine; a person had to manually boot each computer into Windows’ Safe Mode and apply the fix.

IT is often an unglamorous and thankless job, but the CrowdStrike debacle has been a next-level test. Some IT professionals had to coordinate with remote employees or multiple locations across borders, walking them through manual resets of devices. One Indonesia-based junior system administrator for a fashion brand had to figure out how to overcome language barriers to do so. “It was daunting,” he says.

“We aren’t noticed unless something wrong is happening,” one system administrator at a health care organization in Maryland told WIRED.

That person was awoken shortly before 1:00 am EDT. Screens at the organization’s physical sites had gone blue and unresponsive. Their team spent several early morning hours bringing servers back online, and then had to set out to manually fix more than 5,000 other devices within the company. The outage blocked phone calls to the hospital and upended the system that dispenses medicine—everything had to be written down by hand and run to the pharmacy on foot.

Complicating the already nightmarish process were staff shortages. The health care system’s technical staff has been cut in recent years, the system administrator says, and that pushed remaining employees to pull 12- to 14-hour days. If they burn out, there is no one to step in. “We all care about the community that we’re serving—we want to make sure that everything is functioning and that people are being taken care of,” the Maryland administrator says. “But it’s really hard to do that when you don’t have enough staff.”

One chief information security officer of a large health system in the US Midwest emphasized that it isn’t uncommon in health care for budgets to be so tight that organizations have to choose between hiring clinical staff and hiring IT support.

Making everything more difficult were impacted PCs encrypted with the Windows security feature BitLocker. “If you’re using BitLocker, jump off a bridge,” a well-known malware analysis and security news account quipped on X on Friday. In the glitched state, users couldn’t enter the BitLocker keys needed to unlock devices and apply the fix without resorting to complicated workarounds. Microsoft released a recovery tool on Saturday that includes a fix for the issue.

The Midwestern CISO says that even though his employer isn’t a CrowdStrike customer, his team still had to manually address issues on about 120 computers that were running the affected software. But his organization’s biggest disruptions came from partners and other third parties that were directly affected and dealing with outages.

“Medicaid eligibility was down,” he says. “Social Security eligibility was down. Local towns that we do business with were down. And I talked to people at other health systems—this blindsided IT departments to the point that it was all hands on deck. There were nontechnical people running around with USB flash drives doing the fixes at some of America’s largest hospital systems.”

CrowdStrike said on Monday that “a significant number” of the 8.5 million impacted devices “are back online and operational.” And IT professionals tell WIRED, that after a grueling few days, the majority of their organizations’ systems have been restored. But it will take time to reach every machine, everywhere. And the situation has raised deeper questions about how monitoring software is designed and the interconnections of today’s digital systems.

“All it takes is one infrastructure vendor like CrowdStrike,” the health care CISO says. “What’s happened is companies have an emphasis on getting things into production and not taking people off the floor to train them about what to do if something goes down.”

He says that the health care system he works for is back to normal now, but not all of its partners are. “If I’m a CrowdStrike customer,” he says, “the first thing I’m worried about is whether that company is even going to continue to exist in its current form after this.”

Source : Wired

L	M	M	G	V	S	D
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

How IT Departments Scrambled to Address the CrowdStrike Chaos

WEATHER

Exchange

Celtic earn Scottish Cup win over Kilmarnock | Aberdeen, Hibs through

Director Claire van Kampen dies on husband Mark Rylance’s birthday

The whole world will be listening to Trump’s inauguration speech –...

Fuel explosion kills 70 as crowd ‘scooped fuel’ from overturned tanker

WSL: Sarri’s late strike earns Everton a point against Aston Villa

Phenomenal Jonbon too good for Energumene in Clarence House clash