I could not resist noting this today:
Microsoft might need to get the basics right before bragging about them.
Thanks for reading this post.
You can share this post on social media of your choice by clicking these icons:
You can subscribe to this blog's daily email here.
And if you would like to support this blog you can, here:
It was nothing to do with Microsoft. It was due to a pushed update from CrowdStrike which obviously hadn’t been tested before being rolled out. It just happened that it brought down machines running Windows. It could just have easily been Mac OS or Linux.
But it didn’t
So it did have a lot to do with Microsoft
Hi Richard,
Sorry to be pedantic, but it really did not have anything to do with Windows. Other companies that used Windows but not Crowdstrike, were unaffected. That Crowdstrike was used by so many influential businesses, is the problem, along with, as others have commented, it is an outsourcing issue, rather than the OS specifically.
Regards
But in that case my point still stands
Microsoft whould have warned them of the risk of Crowdstrike and did not
Or it should have provided a better alternative
Of course it did.
Thank you and well said, Richard.
Bit busy at work today, but I will try to comment later as I worked with the Bank of England on such matters in 2021 and have done a bit this year.
With regard to Microsoft and Gates, this is how Gates buys silence: https://gript.ie/bill-gates-bankrolled-select-media-outlets-to-the-tune-of-319-million-including-the-uks-guardian-and-the-bbc/. The Gates Foundation*, a “tax efficient” investment vehicle pretending to be a charity, has placed many employees in media, government and academia. Hacks and other professionals want to work with or for Gates. US foundations are required to contribute 3% to charity annually.
Richard,
You are correct! To put this in “construction” terms. Microsoft is the General Contractor and Crowdstrike is the Sub-contractor. A general Contractor is responsible for the work of the Sub-contractor they engage.
If a plumbing contractor installs four perfect bathrooms and one sub-standard bathroom for a general contractor, the sub-contractor (plumber) is at fault but the responsibility is on the General Contractor.
Do not forget that the Crowdstrike update was a kernel module (thus powerful enough to do anything it liked to any computer on which it was installed) which was ‘signed’ as legitimate **by Microsoft**. With malicious intent the outage could have been far worse, and apparently Microsoft has no system in place to prevent that.
You have answered the question I asked about the relationship between the two companies. Clearly Micro$oft must share some of the blame.
There is a great deal of pedantry, special pleading and repressed legalism here. This is a consequence of the interconnectedness of complex, insecure systems; as a consequences of the innovation in Big Tech that demonstrates principally the innovators do not know what they are doing because there is insufficient restraint, built-in redundancy or testing (all of which is inevitable, because it is endlessly repeated – it isn’t new, the disasters simply become become bigger, and the disorder greater) What I see here is a defence built out of complacency. The law runs far, far behind innovation; and that will take decades to resolve – too late for everybody.
Adam Ferguson explained the problem in 1767. We never, ever learn: “EVERY step and every movement of the mul∣titude, even in what are termed enlightened ages, are made with equal blindness to the future; and nations stumble upon establishments, which are indeed the result of human action, but not the execution of any human design. If Cromwell said, That a man never mounts higher, than when he knows not whither he is going; it may with more reason be affirmed of communities, that they admit of the greatest revolutions where no change is intended, and that the most refined politicians do not always know whither they are leading the state by their projects”.
In the 21st century, replace the word “state” or “nation” or “community” in Ferguson, with “business” or “digital revolution” and you will realise that States now bend to technology, and are their servants, but we remain in exactly the same predicament as Ferguson observed, and gifted us with a prophetic insight into our present and future.
CrowdStrike does have versions for other operating system, and these have been causing problems too:
See: “CrowdStrike broke Debian and Rocky Linux months ago, but no one noticed”
https://www.neowin.net/news/crowdstrike-broke-debian-and-rocky-linux-months-ago-but-no-one-noticed/
My attitude would be that if you are foolish enough install proprietary software on a Debian system then you deserve anything that happens as a consequence. Asking around a bit, I found out that CrowdStrike messes with the Linux kernel. Software that is allowed to do that is absolutely verboten as far as I am concerned. Indeed the standard security tools one would run on a Debian system go to great efforts to ensure that such things cannot happen by accident.
I suppose the real question is what is the relationship between Micro$oft and CrowdStrike? Micro$oft obviously can’t physically stop users installing software on their operating systems. But do they actually recommended CrowdStrike? If not, I can’t see how they can share any of the blame for what happened.
Actually it did affect some Linux systems: https://www.theregister.com/2024/07/21/crowdstrike_linux_crashes_restoration_tools/
Linux even has a Blue Screen of Death now thanks to systemd…
Mine doesn’t.
“Microsoft whould have warned them of the risk of Crowdstrike and did not”
It would have come as a surprise to Microsoft as well. Crowdstrike is the name of a company that provides cybersecurity software. It isn’t part of MS. The fact that they didn’t do a controlled roll-out of the update, which would have highlighted issues sooner rather than later, is puzzling.
MS can’t be held responsible for third party’s incompetence.
I never thought, in over 40 years in the IT business, that I would be defending MS!
But Microsofty says it can do Cybersecurity
My whole point is that it obviously could not
What did I get wrong?
I see your point.
I also wasn’t aware that this was an update to the kernel which does raise a lot of questions as to why it wasn’t tested properly and rolled out in a phased manner.
I hope Crowdstrike have good insurance cover.
The Crowd Strike software update affected the kernel – a highly sensitive and protected part of the Windows operating system. Normal third party “user mode” software updates cannot access this part of the operating system unless system administrators in the companies installing Crowd Strike explicitly override the built in safety controls by giving the software “kernel level” access – which is admittedly common practice for security software. So like most disasters this comes down to insufficient risk management and change control on the part of companies sourcing Crowd Strike – with benefit of hindsight system admins should have evaluated the huge impact a faulty update could cause and implemented mitigations such as staggered update schedule or redundant systems that could be safely booted. Easy to say after the event. I personally do not think managing this risk was really MS responsibility once the system admins handed Crowd Strike kernel super powers. In my experience, after twenty years in the biz, I believe the software / IT industry is still professionally immature and we do not have the frameworks and institutions of other professions needed to manage the very significant risk of IT properly – let alone the emerging risks of AI!
Thanks
But it still says, MS did nit properly assess the risks to cybersecurity, and that is all I suggested.
“But Microsofty says it can do Cybersecurity”
Well, they would say that wouldn’t they?
The problem, here Mr Knowles (and I use your comment as an illustration of our problem, not to ‘skewer’ you); is that instead of starting with a question in a complex area (or an exploration), you began with an authoritatively insistent, dismissive answer.
Our problem is not too many questions of authority (commercial or government); but too few.
I have been much disturbed by Pat McFadden’s muddled, unsatisfactory and inadequate statements at the Post Office Horizon Enquiry. Ed Davey was hapless. Neither seem to understand the task of government; in Harry S Truman’s words “the buck stops here”. McFadden’s apologetics are neatly summed an hour into his witness statement, in answer to the challenging question, responding “we have no independent information”; if you weren’t to rely on the Civil Service, what effect would that have on the business of government. McFadden answered: “Trust in what you are told ….. soon you could see it is difficult to govern; … trust is at the heart of how this system works”. Where does accountability lie? McFadden admits, if it is State owned the responsibility is with Government, but Government is trying to make the business of the state owned body independent. The profound confusion at the heart of this mess could scarcely be clearer.
McFadden’s sense of trust is astonishing. He later said, “blind faith from the Post Office in their IT system turn[ed] into something more sinister where people simply were just not telling the truth”. But he thinks blind faith IN the Post Office is/was just fine for the Ministers and the Civil Service; and not least by maintaining blind faith in themselves. This from a politician; a sphere of life in which lying is routine; a routine to which he must surely have been long exposed.
There is an ability to challenge the assumption of trust; a challenge that is a necessity for Ministers, and essential to good Government. Clearly since 2010 something happened, but not enough; not nearly enough. The Shareholder Executive of McFadden’s day is now UK Government Investments, and in its Annual Report it records this activity: “The ARC [Audit and Risk Committee] works with senior management, the Board, the Government Internal Audit Agency (GIAA) and the National Audit Office (NAO) in carrying out its mandate”. In addition, there is an external auditor: “UKGI has appointed the Comptroller and Auditor General as its external auditor. The National Audit Office carries out the audit for and on behalf of the Comptroller and Auditor General. The remuneration paid to its external auditor for work relating to this financial year was £47,250 plus VAT, and an additional fee for work relating to 2021-22 for £7,000 plus VAT, (2021-22 £40,000). No non-audit work was undertaken by the auditors”.
GIAA was appointed in 2015. Non of this frankly desultory looking activity (largely by Whitehall insiders) appears to have been of much benefit to the Postmasters/Postmistresses in nine years. It is not even clear that the GIAA or NAO are adequately equipped to carry out the kind of forensic role (backed by all the legal or prerogative power required to lay conditions bare, if or when required, directly acting for government; Customs & Excise possess the kind of powers I to which I refer): so there is nothing revolutionary here, it just requires politicians capable of thinking about the right issues, and acting effectively – and not just playing the system. Everybody should watch McFadden and Davey at the Horizon Enquiry. The closed minds and lack of imagination is striking. already, this is the public’s wake-up call; and the public elected them; now they need to shape up – a lot better than this demonstration. So the public should be aware now. If blind faith is the best they can do, they can’t be trusted.
John
Most auditors are very young and very junior, struggling to udnerstand why they are being asked to udnertake the tasks entrusted to them, and in my experience often not doing so. There is no chance of criticial thinking there, and the audit profession is not known for it.
@Ben Butchart. My impression is that companies like Micro$oft have a band-aid approach to security. Basically they wait until something goes wrong and then put a sticky plaster over it. In the early versions of Windows, there was no distinction between users, everyone had access to everything. What we have now is essentially a games machine operating system with a whole pile of sticky plasters.
Personally I would like to see all software licensed under the GPL-3, but failing that, companies that want their software used on machines connected to the public internet should, at least, be required to publish their source code. Failure to do this leads to a sloppy attitude to security whereby you think that just because security flaws are not immediately visible, they don’t matter. There simply is no incentive to deal with anything but the most obvious security problems.
I am not looking to auditors alone for this. I am looking for an independent organisation, independent of the Civil Service structure; but that is called on by Government to investigate State Owned institutions, whenever called on (and that would include any new bodies where the State may require to intervene; and to provide advice on the structure of arrangements between Government and institution, and appropriate structures for the institution’s management); that has highly trained lawyers, forensic accountants, statisticians (and auditors – doing the job they are supposed to do)*; but multi-disciplinary; with experienced managers from industry, skilled negotiators, organisational psychologists, and able to call on other independent advice, say in engineering, IT/AI etc., (most of them poachers turned gamekeepers, and so on – but weeding out the self-important, failed gamekeepers I often used to see in some consultancies, representing the supposed voice of experience).
This institution’s work could be extended to provide advice to government on the water industry, or railways or energy; and the restructuring of the relationship between Government and major monopolies. These are are resources we badly need on an organised, structured basis, but lack; and we leave it to politicians who are there to protect democracy, but frankly without the knowledge resource, or basic professional expertise; are out of their depth, quite frankly (and, generally so are our Administrative Class Civil Servants).
* I deliberately left out economists, because they contribute so little; but maybe some of the younger ones – at least to correct the flow of ill-judged misinformation provided by so much of mainstream economics.
I think that would be intensely powerful
It is what the NAO should be doing as part of its service
Nothing wrong IMO. Micro$oft almost certainly “signed” it, which would appear to any reasonable user to imply that they authenticated it’s safety. But from a quick look at the M$ signature options it seems that their ‘signature’ is in practice no more than a purchasable endorsement, worth about as much as a blue tick on Twittt/X.
But you still all miss my point
Why in that case would you rely on them for your cybersecurity?
The NAO is doing the external auditor work on UK Government Investments (UKGI), it appears for a total of around £50k. I am trying to do a thought experiment; what sort of audit can you buy for £50k?
Here is what UKGI do:
Corporate governance – Act as a shareholder representative for, and lead establishment of, UK government’s most complex and commercial arm’s length bodies (ALBs) on behalf of sponsor departments
Corporate finance – Advise on major UK Government corporate finance matters, including financial interventions into corporate structures and corporate finance negotiations
Contingent liabilities – Advise on and analyse the UK Government’s contingent liabilities
Government Corporate Transactions – Advise on major UK Government corporate finance matters, including financial interventions into corporate structures and corporate finance negotiations
External Audit for £50k. Ignoring the large sums sunk in the Government Investments, or in its contingent liabilities (and ignoring its involvement in such issues as the sale of the NatWest investment); UKGI has total operating expenditure of £23.6m.
As far as I can see, the NAO is the only external audit input. I must be missing something …….. I really hope I am missing something.
What audit can you buy for £50k? One of no value?
“Why in that case would you rely on them for your cybersecurity? ” I don’t and wouldn’t. But technically semi-literate managers lobbied endlessly by self serving ‘consultants’ and ‘IT advisers’ with a vested interest in flogging M$ product most likely don’t read the small print or fully understand the implications if they do. But they probably do understand the career risks of doing something different. Back when IBM ruled the world the catch phrase was ‘No-one ever got fired for buying Big Blue’. Now, who gets fired for buying M$ ?
But my point still stands……
@Bob Thomlinson
“But from a quick look at the M$ signature options it seems that their ‘signature’ is in practice no more than a purchasable endorsement, worth about as much as a blue tick on Twittt/X.”
This is typical of the way software companies avoid responsibility for everything. I have not read the license for the CrowdStrike software but I wager it makes the user responsible for anything that goes wrong.
It is partly Microsofts fault. The reason is that for Crowdstrike to work, it has to have privileged access to the underlying operating system (Windows), in order to monitor for anomalies. Obviously Microsoft doesn’t want to allow just anyone to distribute software with such access, so they cryptographically sign any such kernel extensions – in other words they effectively warrant that the 3rd party software is safe to install in Windows.
Microsoft should do enough testing internally on any such 3rd party kernel extension, rather than relying just on the 3rd party who obviously have commercial incentives to say everything is okay. It looks like the update was just a data file that was not handled by the existing code safely, causing the computers to crash with BSOD. I recall Apple phones had a similar problem previously with crafted text messages that contained data that hadn’t been tested for. It is easy to miss this stuff.
Still most of the blame must be on Crowdstrike who should never release updates without appropriate testing. I have heard suggestion it was a logo graphic or font change, so they didn’t feel the need to go through a full test cycle. That may not be the actual case, but it is clear that Crowdstrike have missed a test case (as well as Microsoft).
Thanks
I agree with what others have said, that crowdstrike is the root cause of the outage. But Microsoft does share some blame:
* That their product is so lacking that companies feel the need to install third party security and management software in the first place
* That their product is vulnerable enough that a single dodgy driver can collapse the entire system
But I would argue there is another entity that takes some of the blame: the market. We essentially have 2 major operating systems in commercial use. (I’m not counting iOS, MacOS, android, or chromeos, because practically no-one uses those in customer-facing kiosks or behind-the-scenes cloud systems). We have linux and windows. So whenever one of those faces a fault, half of the world’s systems go dark.
I’m not really sure what the solution to that is. My personal and political preference would be to see greater investment in Linux and other open source projects. If we’re going to have a duopoly, we might as well invest and tighten up security in the choice that we can actually use and modify freely, rather than continuing to support a company that wouldn’t exist were it not for its own intertia.
Google was founded by open-source idealists; but the dot,com bust changed everything.* the problem with the lack of redundancy or testing is a function of the short-term pressure of the drive for profit and quick results. We therefore acquire innovation at incalculable cost; but the profit makers factor out the failure; never look back at the chaos created, or pick up the bill (they know the law will never catch up with the technology, or catch up effectively; and meanwhile they have the money to spin a tale of how wonderful they make the world; even as they comprehensively trash them. And they make so much money trashing it, the politicians become their gophers. We can see that here and in the US easily enough.
* Shoshanna Zuboff ‘ The Age of Surveillance Capitalism’ charts how it happened.
The point is made today that “this catastrophe wasn’t Microsoft’s fault but it is Microsoft’s problem. It’s a problem that Apple has addressed, and Microsoft has a lot to answer for in not making Windows more robust in the way that macOS is now.”
For a clear explanation, see Howard Oakley at:
https://eclecticlight.co/2024/07/22/could-our-macs-be-crowdstruck/
Microsoft certified the crowdstrike kernel code through its testing labs. However whilst the code was approved way back and hadn’t changed Microsoft failed to appreciate that the code executes other code (often called pseudo code) which is presented as ‘configuration’ or a ‘data’ file at each update release.
So whilst strictly speaking the certified code hadn’t changed it does however behave differently according to the associated ‘data’ file that gets released periodically.
The released data file last week was ‘corrupt’ and it caused the Microsoft certified crowdstrike code to run corrupted pseudo code and behave in a way it wasn’t tested for. The certified code failed. When a code in the kernel fails it halts and paints the blue screen.
In this scenario you simply remove the released ‘data’ file for this week which is found in a folder under Windows. Restart the system and all should be fine.
So, Richard is right. Microsoft tested and approved a third party kernel code without understanding how it actually worked. They approved it to run in the highly privileged part of the os, – the kernel. It ran according to the pseudo code in the ‘data’ file which was not part of certification testing. It crashed causing huge problems across the globe.
If that isn’t Microsoft’s error I don’t know what is.
Thanks
@Sanjay. CRC checks are fine for checking for possible corruption in non critical communication but they are not how you do security.
To take a simple example, the last character of your National Insurance number is a CRC check. It is useful for detecting whether I have typed in a valid NI number, but it can’t detect whether I have typed in the correct NI number – e.g. mine and not my brother’s.
But what is puzzling me is what the digital signature is supposed to be for. The normal protocol, at a minimum, would be something like this:
1. Download the digital signature for the updated software from a secure server belonging to whoever is signing the software, in this case M$.
2. Download patches for the new version from the software vendor.
3. Recreate the signature locally and check that it is the same as the downloaded version.
Doing this means you can be reasonably sure the software has been updated correctly to a version that has been checked by the signer.
From what has been said it seems that nothing like this is happening, hence my question: what is the point of M$’s digital signature?
The idea that engineers at M$ testing labs would be unaware that software runs differently when you change a configuration file is absurd. I was teaching how to do mutation testing before M$ even existed.
But what you are saying suggests something even more worrying. I had assumed, that since the Cloud Strike software was meant to enhance security, the “signature” would contain something like a sha256 digest of the version being used and there would be some mechanism for stopping a corrupted version being installed.
If it doesn’t do that what is the point of the “signature”? And doesn’t this open users up to man-in-the middle attacks?
@bernard hurley
I don’t know enough detail about the data file but a man in the middle attack would theoretically possible if there aren’t sufficient checks to ensure an unmodified ‘data’ file is being read. Such checks are easy enough and I suspect they do do them.
Here they have unintentionally released an erroneous (aka ‘corrupt’) ‘data’ file through their normal file release process which likely includes a crc stamp and has hashed data.
Apparently the ‘data’ file contained all zeros.
On a wider level, in my view, our dependence on IT is now getting dangerous and catatrophic problems aren’t being designed against. On the other hand these are becoming harder and harder to design against as we rely more and more on tech.
“Apparently the ‘data’ file contained all zeros.”
That’s pretty gross to have been acted on without any internal validation.