Entrepreneurial Toolbox, Technology

What The CrowdStrike Incident Taught One MSP About Preparedness and Relationships

July 25, 2024
1:04 pm

When last Friday’s CrowdStrike update caused a massive IT outage, Neal Juern’s day started like many other MSPs’ that day—a scramble to help customers get their systems back up and running.

Thanks to a solid incident response (IR) plan, some creative thinking, and a key vendor relationship, the MSP had most of its affected customers up and running by the end of the day, with some stragglers left on Monday morning. Plus, they took the opportunity to have a post-mortem to apply the lessons learned to become more resilient for any future event, says the CEO of 7tech, a national MPS/MSSP in San Antonio, Texas.

Blue Screen Of Death

The IT outage first came to light about 3:00 am when his overnight techs noticed that their RMM, which they host internally, went offline. Yep, they saw the “blue screen of death.” They quickly figured out it was a CrowdStrike issue, but because the RMM was down, “we didn’t have full visibility into all of our client systems to know if they were up or down,” Juern recounts. “But once we got that back online, which took about an hour, we started to see, ‘Oh my gosh, this isn’t just us.’”

As we all know now, it was an update from CrowdStrike that caused the outage. “Even though they [CrowdStrike] pushed out that bad file, because it doesn’t get rolled to every single system immediately, and they then corrected it, it didn’t impact every system.”

Juern says it did impact about 50% of their customers’ Windows servers and about 20% of their workstations.

Putting The IR Plan Into Action

The manual process of getting their own systems back up began around 5:00 am, he says, and by then all staff were following the IR plan.

“We were pleased with the way our incident response plan worked,” Juern says. The team was first alerted via a Teams channel. The plan specifies that “nobody comes into our office. They all stay where they’re at [so they] don’t waste any time commuting. They start work early and start working with clients,” he explains. The operations team communicates via a dedicated Teams channel, sharing knowledge of what’s working and what isn’t. Should the Teams channel become unavailable, everybody knows to switch to a channel they built inside RingCentral.

“All our team just put their heads down and, man, they cranked it out pretty fast.”

Why Vendor Relationships Matter

The outage required a manual fix to reboot servers and workstations, and 7tech was well underway doing just that when Juern received a text message from one of his vendor partners, ThreatLocker CEO Danny Jenkins, who knew Juern also used CrowdStrike. He suggested a fix that would be faster than doing it manually—something Juern had been thinking about too, he recalls.

“We both kind of had the same idea, which is, because CrowdStrike and ThreatLocker both start up as what are called kernel drivers, they start up very early in the boot-up process and they also have exclusive access. The idea that I was talking to Danny about was [trying to] have ThreatLocker block access to the bad file for CrowdStrike … and then it will boot up. Then the CrowdStrike could push out that newer file.”

After Jenkins and Juern brainstormed, they had their two top technical people build a storage rule that blocked access to the file, which took a few hours to get right. Then ThreatLocker “published it into their threat ops library where any CrowdStrike partner could then just grab it and apply that rule and get their systems up,” Juern says.

Juern says that workaround sped up their recovery process tremendously. “That was a beautiful thing.”

He adds, “They [ThreatLocker] were proactively reaching out to other ThreatLocker clients that had CrowdStrike because they were seeing the logs of machines booting up and then blue screening.”

For the few systems that the workaround didn’t work, Juern’s techs did the manual fix; in other cases they rebooted several times, and the system would come up, he says.

The Post-Mortem

While Juern was happy with how their staff executed their IR plan, “every time you use it, you learn how you need to improve it,” he says. He identified some gaps in their escalation process “so we’re going to tighten that up a little bit.”

They are also discussing how to make their internal infrastructure more resilient “so that if something impacts our clients, we’re not impacted, and we have a better view of what’s really happening.” He is contemplating delaying CrowdStrike updates for a few internal systems. “They’ll still have ThreatLocker. But if we can delay [CrowdStrike updates] 12 hours or a day, that gives us time to see what’s going on.”

He cautions that clients shouldn’t delay security updates, however.

Juern adds that criticism of CrowdStrike fails to tell the whole story. “It’s an interesting dilemma that CrowdStrike is in,” he says. “CrowdStrike does not have the luxury of being able to do extensive testing on updates before they push them out. Otherwise, they’re actually leaving their clients at risk.”

The Moral Of The Story

For Juern, the moral of this story is “it really solidified that having strong partnerships is important with your vendors. ThreatLocker didn’t have to help us with that, but they did. We have a really good partnership with them so that really meant a lot to us.”

In addition, he says several MSPs in his peer community through TMT reached out immediately to offer help, “which was pretty cool.”

And as for CrowdStrike, he says the incident doesn’t change his opinion that it’s a good product. “As long as they don’t make this a habit, we’ll stick with them.”

Author:

Colleen Frye

Colleen Frye is the former executive editor of MSP Success. A veteran of the B2B publishing industry, she has been covering the channel for nearly two decades.

Get The #1 Media Source For MSPs!
Thousands Of MSPs Trust
MSP Success
For The Best Industry News, Trends And Business Growth Strategies. Subscribe now!

What The CrowdStrike Incident Taught One MSP About Preparedness and Relationships

Blue Screen Of Death

Putting The IR Plan Into Action

Why Vendor Relationships Matter

The Post-Mortem

The Moral Of The Story

Share:

Author:

Colleen Frye

RELATED ARTICLES

Get The #1 Media Source For MSPs!
Thousands Of MSPs Trust
MSP Success
For The Best Industry News, Trends And Business Growth Strategies. Subscribe now!

Upcoming Events

Easily Dissuaded

Why the AEC Industry Is a Smart Vertical for MSPs (and How to Break In)

9 Genius “Customer Experience” Touches That Cost Almost Nothing

4 Ways MSPs Can Add a SOC to Their Offerings: Options, Benefits, and Tradeoffs

Website Terms. Privacy Policy.

What The CrowdStrike Incident Taught One MSP About Preparedness and Relationships

Blue Screen Of Death

Putting The IR Plan Into Action

Why Vendor Relationships Matter

The Post-Mortem

The Moral Of The Story

Share:

Author:

Colleen Frye

RELATED ARTICLES

Get The #1 Media Source For MSPs!Thousands Of MSPs Trust MSP Success For The Best Industry News, Trends And Business Growth Strategies. Subscribe now!

Upcoming Events

Easily Dissuaded

Why the AEC Industry Is a Smart Vertical for MSPs (and How to Break In)

9 Genius “Customer Experience” Touches That Cost Almost Nothing

4 Ways MSPs Can Add a SOC to Their Offerings: Options, Benefits, and Tradeoffs

Thousands Of MSPs Trust MSP Success For The Best Industry News, Trends and Business Growth Strategies

Get The #1 Media Source For MSPs!
Thousands Of MSPs Trust
MSP Success
For The Best Industry News, Trends And Business Growth Strategies. Subscribe now!

Thousands Of MSPs Trust
MSP Success
For The Best Industry News, Trends and Business Growth Strategies