Crowdstrike screwup destroys commerce

exoticspice1 · Jul 20, 2024

dada_dave said:
Southwest Airlines unaffected

Archaic Windows version saves the day during CrowdStrike outage — Southwest Airlines scrapes by with ancient OS [Updated]

Windows 3.1 and Windows 95 save the day.

www.tomshardware.com

I suppose that’s one way to do it, just run Windows 3.1 and 95. Problem solved.

or if companies could use Linux... Invest in linux and not Windows.

dada_dave · Jul 20, 2024

exoticspice1 said:
or if companies could use Linux... Invest in linux and not MS

I wasn’t being serious.

exoticspice1 · Jul 20, 2024

dada_dave said:
I wasn’t being serious.

i know but I was.

Jimmyjames · Jul 20, 2024

CrowdStrike switched their macOS software to the Endpoint Security framework (user space) in 2020. This blog post describes that.
https://www.crowdstrike.com/blog/crowdstrike-supports-new-macos-big-sur/

I believe they switched their Linux client to use eBPF for a similar reduction in kernel exposure.

casperes1996 · Jul 21, 2024

jbailey said:
The (world) thing was hyperbole but I have to say trying to be a developer in the modern world without internet is quite painful. I don't know what part of the IT infrastructure here uses CrowdStrike (I'm a contractor with my own equipment) but no one had internet. If I had known, I could have brought my iPad and tethered to that but my iPhone tethering is severely throttled.

To my knowledge no networking infrastructure was hit at all around here (around here meaning all of Denmark)

Cmaier · Jul 22, 2024

Microsoft blames European Commission for global CrowdStrike catastrophe

The worldwide outage of Windows PCs was because of European Commission demands, says Microsoft, and we should get used to it.

appleinsider.com

Microsoft seems to blame the EU for the outage. Their argument is they can’t do what Apple did and push this junk out of kernel-space, because the EU requires that third parties have the same access to the kernel that Microsoft does (for Defender).

Of course, they don’t explain why they can’t spin out their own security software into a separate company, or why they can’t build in SDKs that everyone, including Defender, can use.

But, in any case, I always like blaming the EU.

Nycturne · Jul 22, 2024

Cmaier said:
Of course, they don’t explain why they can’t spin out their own security software into a separate company, or why they can’t build in SDKs that everyone, including Defender, can use.

That would require work, and therefore money, and there would be no clear return on that investment for Microsoft. Lack of a black eye cannot be measured in revenue before-hand. Therefore it is impossible to do such a thing, or even to think in those terms.

(I have literally had arguments with managers on topics like this. It’s the whole "nothing going wrong doesn’t get noticed like a new feature" issue)

jbailey · Jul 22, 2024

Cmaier said:
Microsoft blames European Commission for global CrowdStrike catastrophe

The worldwide outage of Windows PCs was because of European Commission demands, says Microsoft, and we should get used to it.

appleinsider.com

Microsoft seems to blame the EU for the outage. Their argument is they can’t do what Apple did and push this junk out of kernel-space, because the EU requires that third parties have the same access to the kernel that Microsoft does (for Defender).

Of course, they don’t explain why they can’t spin out their own security software into a separate company, or why they can’t build in SDKs that everyone, including Defender, can use.

But, in any case, I always like blaming the EU.

There is an upcoming API that is in use in Linux eBPF (used to stand for extended Berkeley Packet Filter but that no longer fits so it's not an acronym any more) that will allow software like Crowdstrike to do its job without needing to be a kernel extension. It will still be optional but reputable companies will all use it to avoid problems like this one. Unfortunately, it isn't quite ready for Windows yet.

casperes1996 · Jul 22, 2024

jbailey said:
There is an upcoming API that is in use in Linux eBPF (used to stand for extended Berkeley Packet Filter but that no longer fits so it's not an acronym any more) that will allow software like Crowdstrike to do its job without needing to be a kernel extension. It will still be optional but reputable companies will all use it to avoid problems like this one. Unfortunately, it isn't quite ready for Windows yet.

eBPF is coming to Windows? Haven't heard this and almost don't see how. I assume it won't be plug and play with existing eBPF programs. - Have a friend who's looked into formal verification strategies for eBPF programs

Nycturne · Jul 22, 2024

casperes1996 said:
eBPF is coming to Windows? Haven't heard this and almost don't see how. I assume it won't be plug and play with existing eBPF programs. - Have a friend who's looked into formal verification strategies for eBPF programs

Microsoft has an OSS project working on it apparently: https://github.com/microsoft/ebpf-for-windows

jbailey · Jul 22, 2024

casperes1996 said:
eBPF is coming to Windows? Haven't heard this and almost don't see how. I assume it won't be plug and play with existing eBPF programs. - Have a friend who's looked into formal verification strategies for eBPF programs

This was on HN this morning: No More Blue Fridays.

dada_dave · Jul 22, 2024

Apparently Crowdstrike has been causing more limited issues on Linux kernels for awhile despite eBPF:

Security Software News, Analysis and Features | Tom's Hardware

Discover more about Security Software with insights from the experts at Tom's Hardware.

www.tomshardware.com

casperes1996 · Jul 22, 2024

dada_dave said:
Apparently Crowdstrike has been causing more limited issues on Linux kernels for awhile despite eBPF:

Security Software News, Analysis and Features | Tom's Hardware

Discover more about Security Software with insights from the experts at Tom's Hardware.

www.tomshardware.com

Even with the full protection of user space, misbehaving processes can cause issues. Excessive file writes for example. Or much to my annoyances crashpad_handler which often starts eating way too much cpu until killed where it will relax for a while. So I guess another moral of this whole story is to try and avoid bad software

Andropov · Jul 23, 2024

One of the surprising outcomes of this disaster is that, in my circle of non-devs, this has sadly been classified as "the Microsoft issue". The nuance that this was an issue caused by a security software vendor was lost. I don't think anyone could accuse me of defending Microsoft, but this is 100% on CrowdStrike.

Cmaier · Jul 23, 2024

Andropov said:
One of the surprising outcomes of this disaster is that, in my circle of non-devs, this has sadly been classified as "the Microsoft issue". The nuance that this was an issue caused by a security software vendor was lost. I don't think anyone could accuse me of defending Microsoft, but this is 100% on CrowdStrike.

100%? IMHO, Microsoft has some culpability here because it still hasn’t provided a migration path for these sorts of apps outside of kernel space.

Nycturne · Jul 23, 2024

casperes1996 said:
Even with the full protection of user space, misbehaving processes can cause issues. Excessive file writes for example. Or much to my annoyances crashpad_handler which often starts eating way too much cpu until killed where it will relax for a while. So I guess another moral of this whole story is to try and avoid bad software

No real disagreement, although I’d say that limiting the damage bad code can do is a worthy engineering goal In and of itself. We generally don’t go ‘oh well’ when earthquakes take down buildings or wind storms take down bridges, we put improved mitigations into action, backed by regulation.

Honestly, I’m getting a bit concerned that we don’t have some sort of software engineering ’code’ similar to building code yet. Clearly software infrastructure is becoming critical to people’s lives and the economy, but we’re still acting like it’s some infant industry that will fall over if we were to dare to put some guardrails in.

casperes1996 · Jul 23, 2024

Nycturne said:
No real disagreement, although I’d say that limiting the damage bad code can do is a worthy engineering goal In and of itself. We generally don’t go ‘oh well’ when earthquakes take down buildings or wind storms take down bridges, we put improved mitigations into action, backed by regulation.

Honestly, I’m getting a bit concerned that we don’t have some sort of software engineering ’code’ similar to building code yet. Clearly software infrastructure is becoming critical to people’s lives and the economy, but we’re still acting like it’s some infant industry that will fall over if we were to dare to put some guardrails in.

Fully agree. That’s also part of the point of address space randomization. If your program is written perfectly randomizing its address space is pointless. And if it isn’t, there might still be a way of exploiting it with randomized address space. But it’s harder to do so and may prevent some cases or just make it tricky enough to make it infeasible.

I am in no way arguing that we might as well run everything in kernel mode. That’d be horrible. But code running with your user privileges can still delete all the data you have access to as a user. (Barring additional permission granularity systems like on macOS with separate permissions for seeing documents, downloads and full disk access on top of user privileges)

casperes1996 · Jul 23, 2024

Ideally everything always runs with least required privileges. I would be in favor of expanding the current user privilege system on Unix to a lattice structure where all programs effectively spawn a new user that cannot see anything made by a sibling but the user who spawned the program is their shared upper bound that can see all their data and grant them per process

seudo user sibling access if need be.
I guess all we need is a semi lattice. No need for a greatest lower bound operation. Just spitballing.

Yoused · Jul 23, 2024

casperes1996 said:
I would be in favor of expanding the current user privilege system on Unix to a lattice structure where all programs effectively spawn a new user that cannot see anything made by a sibling but the user who spawned the program is their shared upper bound that can see all their data and grant them per processseudo user sibling access if need be.

Depends on your definition of "see". I tend to take a strict view on visibility: any given process should be allowed to decide what data it owns is visible to whichever other processes it wants to expose it to. A parent should not have full access to its children's data without approval, in order to make possible attack surfaces as small as we can.

Andropov · Jul 23, 2024

Cmaier said:
100%? IMHO, Microsoft has some culpability here because it still hasn’t provided a migration path for these sorts of apps outside of kernel space.

I mean, I think Apple's approach is far superior. Things like System Integrity Protection and the entitlement system greatly minimize the damage a process with root privileges can do, and I believe it's the right choice for a mainstream OS. But it does come with limitations. One may have legitimate reasons to want to do something that is now (on macOS & iOS) gated behind an entitlement that Apple won't issue to you. Is this extra freedom worth significantly downgrading the security of the system? In my opinion, for a OS used by non-tech people, the answer is no.

The reason I felt this was 100% on CrowdStrike is that, while in this particular instance other operating systems did a better job, in the end even a buggy user space program can cause significant issues (significant enough to make the system inoperative). In this case it happened to be a crash, that on Windows took the entire system down because it was running on kernel space, but imagine if instead of a crash the faulty CrowdStrike process had started writing bytes to disk nonstop (until the disk was full). That would have caused major disruptions as well, and as far as I know no OS can do much to prevent that kind of issue. Or if it entered an infinite loop on several threads, pinning the CPU utilization to 100%, effectively DDoSing the system. Or a million other possibilities.

I can't imagine what happened behind the scenes to end up pushing untested changes to billions of devices at once, with no phased release (on a Friday, no less!). At that scale, you must phase the releases and monitor them. Getting such a critical thing wrong is miles worse than having a bad privilege system in your OS.

Crowdstrike screwup destroys commerce

Site Champ

Elite Member

Site Champ

Elite Member

Site Champ

Site Master

Elite Member

Power User

Site Champ

Elite Member

Power User

Elite Member

Site Champ

Site Champ

Site Master

Elite Member

Site Champ

Site Champ

up

Site Champ