The CrowdStrike Outage and Market-Driven Brittleness

Friday’s massive internet outage, caused by a mid-sized tech company called CrowdStrike, disrupted major airlines, hospitals, and banks. Nearly 7,000 flights were canceled. It took down 911 systems and factories, courthouses, and television stations. Tallying the total cost will take time. The outage affected more than 8.5 million Windows computers, and the cost will surely be in the billions of dollars­easily matching the most costly previous cyberattacks, such as NotPetya.

The catastrophe is yet another reminder of how brittle global internet infrastructure is. It’s complex, deeply interconnected, and filled with single points of failure. As we experienced last week, a single problem in a small piece of software can take large swaths of the internet and global economy offline.

The brittleness of modern society isn’t confined to tech. We can see it in many parts of our infrastructure, from food to electricity, from finance to transportation. This is often a result of globalization and consolidation, but not always. In information technology, brittleness also results from the fact that hundreds of companies, none of which you;ve heard of, each perform a small but essential role in keeping the internet running. CrowdStrike is one of those companies.

This brittleness is a result of market incentives. In enterprise computing—as opposed to personal computing—a company that provides computing infrastructure to enterprise networks is incentivized to be as integral as possible, to have as deep access into their customers’ networks as possible, and to run as leanly as possible.

Redundancies are unprofitable. Being slow and careful is unprofitable. Being less embedded in and less essential and having less access to the customers’ networks and machines is unprofitable—at least in the short term, by which these companies are measured. This is true for companies like CrowdStrike. It’s also true for CrowdStrike’s customers, who also didn’t have resilience, redundancy, or backup systems in place for failures such as this because they are also an expense that affects short-term profitability.

But brittleness is profitable only when everything is working. When a brittle system fails, it fails badly. The cost of failure to a company like CrowdStrike is a fraction of the cost to the global economy. And there will be a next CrowdStrike, and one after that. The market rewards short-term profit-maximizing systems, and doesn’t sufficiently penalize such companies for the impact their mistakes can have. (Stock prices depress only temporarily. Regulatory penalties are minor. Class-action lawsuits settle. Insurance blunts financial losses.) It’s not even clear that the information technology industry could exist in its current form if it had to take into account all the risks such brittleness causes.

The asymmetry of costs is largely due to our complex interdependency on so many systems and technologies, any one of which can cause major failures. Each piece of software depends on dozens of others, typically written by other engineering teams sometimes years earlier on the other side of the planet. Some software systems have not been properly designed to contain the damage caused by a bug or a hack of some key software dependency.

These failures can take many forms. The CrowdStrike failure was the result of a buggy software update. The bug didn’t get caught in testing and was rolled out to CrowdStrike’s customers worldwide. Sometimes, failures are deliberate results of a cyberattack. Other failures are just random, the result of some unforeseen dependency between different pieces of critical software systems.

Imagine a house where the drywall, flooring, fireplace, and light fixtures are all made by companies that need continuous access and whose failures would cause the house to collapse. You’d never set foot in such a structure, yet that’s how software systems are built. It’s not that 100 percent of the system relies on each company all the time, but 100 percent of the system can fail if any one of them fails. But doing better is expensive and doesn’t immediately contribute to a company’s bottom line.

Economist Ronald Coase famously described the nature of the firm­any business­as a collection of contracts. Each contract has a cost. Performing the same function in-house also has a cost. When the costs of maintaining the contract are lower than the cost of doing the thing in-house, then it makes sense to outsource: to another firm down the street or, in an era of cheap communication and coordination, to another firm on the other side of the planet. The problem is that both the financial and risk costs of outsourcing can be hidden—delayed in time and masked by complexity—and can lead to a false sense of security when companies are actually entangled by these invisible dependencies. The ability to outsource software services became easy a little over a decade ago, due to ubiquitous global network connectivity, cloud and software-as-a-service business models, and an increase in industry- and government-led certifications and box-checking exercises.

This market force has led to the current global interdependence of systems, far and wide beyond their industry and original scope. It’s why flying planes depends on software that has nothing to do with the avionics. It’s why, in our connected internet-of-things world, we can imagine a similar bad software update resulting in our cars not starting one morning or our refrigerators failing.

This is not something we can dismantle overnight. We have built a society based on complex technology that we’re utterly dependent on, with no reliable way to manage that technology. Compare the internet with ecological systems. Both are complex, but ecological systems have deep complexity rather than just surface complexity. In ecological systems, there are fewer single points of failure: If any one thing fails in a healthy natural ecosystem, there are other things that will take over. That gives them a resilience that our tech systems lack.

We need deep complexity in our technological systems, and that will require changes in the market. Right now, the market incentives in tech are to focus on how things succeed: A company like CrowdStrike provides a key service that checks off required functionality on a compliance checklist, which makes it all about the features that they will deliver when everything is working. That;s exactly backward. We want our technological infrastructure to mimic nature in the way things fail. That will give us deep complexity rather than just surface complexity, and resilience rather than brittleness.

How do we accomplish this? There are examples in the technology world, but they are piecemeal. Netflix is famous for its Chaos Monkey tool, which intentionally causes failures to force the systems (and, really, the engineers) to be more resilient. The incentives don’t line up in the short term: It makes it harder for Netflix engineers to do their jobs and more expensive for them to run their systems. Over years, this kind of testing generates more stable systems. But it requires corporate leadership with foresight and a willingness to spend in the short term for possible long-term benefits.

Last week’s update wouldn’t have been a major failure if CrowdStrike had rolled out this change incrementally: first 1 percent of their users, then 10 percent, then everyone. But that’s much more expensive, because it requires a commitment of engineer time for monitoring, debugging, and iterating. And can take months to do correctly for complex and mission-critical software. An executive today will look at the market incentives and correctly conclude that it’s better for them to take the chance than to “waste” the time and money.

The usual tools of regulation and certification may be inadequate, because failure of complex systems is inherently also complex. We can’t describe the unknown unknowns involved in advance. Rather, what we need to codify are the processes by which failure testing must take place.

We know, for example, how to test whether cars fail well. The National Highway Traffic Safety Administration crashes cars to learn what happens to the people inside. But cars are relatively simple, and keeping people safe is straightforward. Software is different. It is diverse, is constantly changing, and has to continually adapt to novel circumstances. We can’t expect that a regulation that mandates a specific list of software crash tests would suffice. Again, security and resilience are achieved through the process by which we fail and fix, not through any specific checklist. Regulation has to codify that process.

Today’s internet systems are too complex to hope that if we are smart and build each piece correctly the sum total will work right. We have to deliberately break things and keep breaking them. This repeated process of breaking and fixing will make these systems reliable. And then a willingness to embrace inefficiencies will make these systems resilient. But the economic incentives point companies in the other direction, to build their systems as brittle as they can possibly get away with.

This essay was written with Barath Raghavan, and previously appeared on Lawfare.com.

—————
Free Secure Email – Transcom Sigma
Boost Inflight Internet
Transcom Hosting
Transcom Premium Domains

Data Wallets Using the Solid Protocol

I am the Chief of Security Architecture at Inrupt, Inc., the company that is commercializing Tim Berners-Lee’s Solid open W3C standard for distributed data ownership. This week, we announced a digital wallet based on the Solid architecture.

Details are here, but basically a digital wallet is a repository for personal data and documents. Right now, there are hundreds of different wallets, but no standard. We think designing a wallet around Solid makes sense for lots of reasons. A wallet is more than a data store—data in wallets is for using and sharing. That requires interoperability, which is what you get from an open standard. It also requires fine-grained permissions and robust security, and that’s what the Solid protocols provide.

I think of Solid as a set of protocols for decoupling applications, data, and security. That’s the sort of thing that will make digital wallets work.

—————
Free Secure Email – Transcom Sigma
Boost Inflight Internet
Transcom Hosting
Transcom Premium Domains

Robot Dog Internet Jammer

Supposedly the DHS has these:

The robot, called “NEO,” is a modified version of the “Quadruped Unmanned Ground Vehicle” (Q-UGV) sold to law enforcement by a company called Ghost Robotics. Benjamine Huffman, the director of DHS’s Federal Law Enforcement Training Centers (FLETC), told police at the 2024 Border Security Expo in Texas that DHS is increasingly worried about criminals setting “booby traps” with internet of things and smart home devices, and that NEO allows DHS to remotely disable the home networks of a home or building law enforcement is raiding. The Border Security Expo is open only to law enforcement and defense contractors. A transcript of Huffman’s speech was obtained by the Electronic Frontier Foundation’s Dave Maass using a Freedom of Information Act request and was shared with 404 Media.

“NEO can enter a potentially dangerous environment to provide video and audio feedback to the officers before entry and allow them to communicate with those in that environment,” Huffman said, according to the transcript. “NEO carries an onboard computer and antenna array that will allow officers the ability to create a ‘denial-of-service’ (DoS) event to disable ‘Internet of Things’ devices that could potentially cause harm while entry is made.”

Slashdot thread.

—————
Free Secure Email – Transcom Sigma
Boost Inflight Internet
Transcom Hosting
Transcom Premium Domains

What Does “Connection is Not Private” Mean?

Have you ever visited a site that triggers a “your connection is not private” or “your connection is not secure” error message? Maybe you moved on. Or maybe you found yourself interested enough to continue anyway. Either way, understanding what the error means can keep you safer online. Knowing what the risks are and how you can clear up the error proves yet more important too. 

Let’s take a look. 

What does “this connection is not private” mean? 

A “your connection is not private” error means that your browser can’t determine with certainty that a website has safe encryption protocols in place to protect your device and data. You can bump into this error on any device connected to the internet — a computer, smartphone, or tablet. 

Note that the “your connection is not private” error is Google Chrome’s phrasing. Other browsers might use “your connection is not secure” or some variation of that as the warning message. 

So, what exactly is going on when you see the “this connection is not private” error? 

For starters, the error is only a warning. It doesn’t mean any of your private info is compromised. A “your connection is not private” error means the website you were trying to visit doesn’t have an up-to-date SSL (secure sockets layer) security certificate. 

So, what’s an SSL? Think of it as a digital certificate that verifies the authenticity of a website. Further, it establishes an encrypted connection between your web browser and the website you’re visiting. As you can imagine, an SSL-protected site is vital when it comes to banking, shopping, or sending secure info online. 

You can spot an SSL-protected site by an address that begins with HTTPS, with the “S” standing for “secure.” Many browsers also drop a little padlock symbol in the address bar to call it out. Some have a button in the bar that you can select to see if the site is protected. 

Website owners must maintain the licensing regularly to ensure the site’s encryption capabilities are up to date. If a website’s SSL certificate is outdated, it means the site owners haven’t kept their encryption licensing current, but it doesn’t necessarily mean they’re up to no good. Even major websites have had momentary lapses that served up the message.  

While it doesn’t always mean a website is unsafe to browse, pay attention. Using a site without an SSL connection might make your personal data less secure. 

How to fix the “connection is not private” error

If you feel confident that a website or page is safe, despite the warning from your web browser, you can troubleshoot the issue a few ways: 

  • Refresh the page. Sometimes, the error is only a momentary glitch. Try reloading the page to rule out temporary errors. 
  • Close the browser and reopen it. Closing and reopening your web browser might also help clear a temporary glitch. 
  • If you’re on public Wi-Fi, think twice. Hackers often exploit public Wi-Fi because their routers are usually not as secure or well-maintained for security. Some public Wi-Fi networks might not support SSL connections altogether. That might result in the error you’re seeing.  
  • Make sure your browser and operating system are up to date. Always keep your critical software and the operating system fully updated. An outdated browser can start getting buggy and can increase the occurrence of this kind of error. 
  • Check that you have the right website. Hackers and scammers often take advantage of misspellings or alternative URLs to try and snare users looking for trusted sites. Make sure you have the address and the site absolutely right. 
  • If it’s not you, it’s them. If you’ve tried all the troubleshooting techniques above and you still see the error, the problem is likely coming from the site itself. You’ll have the option to “proceed to the domain,” though we don’t recommend it. The bottom line is that you take your chances anytime you ignore an error like this. 

How to protect your privacy while online

Personal info like yours is valuable to hackers, so they take every chance they can to get their hands on it. Beyond sticking to visiting secure websites, you have several other ways you can protect yourself online. 

  • Delete unused browser extensions (and apps) to reduce your risk. The more apps you have, the more exposure you have to exploits and attacks. Moreover, out-of-date apps can have security loopholes in them. If you’re not using it, delete it, along with any data you have. 
  • Delete old accounts that still have your info. As it is with apps and browser extensions, the more you keep, the more exposure you have — in this case, to data breaches that can put your personal info in the hands of a hacker. A service like our Online Account Cleanup can identify and shut down those old accounts for you. 
  • Remove your personal info from sketchy data broker sites that sell it to anyone for a price. That includes everyone from advertisers to hackers, scammers, and spammers. Our Personal Data Cleanup scans data broker sites and shows you which ones are selling your personal info — and can help you remove it. 

The post What Does “Connection is Not Private” Mean? appeared first on McAfee Blog.

—————
Free Secure Email – Transcom Sigma
Boost Inflight Internet
Transcom Hosting
Transcom Premium Domains