Saturday, July 20, 2024

The Fragility of Our Critical Infrastructure Exposed (Again)



Yesterday's global outage triggered by a faulty software update from CrowdStrike is a stark reminder of the fragile state of our critical infrastructure. This incident, which affected many Fortune 500 companies, underscores the alarming vulnerability of our essential systems to errors and cyber threats. The chaos that ensued—from canceled flights to darkened billboards in Times Square—highlights the urgent need for a more resilient infrastructure.

A Single Point of Failure

The incident highlighted how a single point of failure within a critical software component can cascade into widespread chaos. The defect in the update from CrowdStrike crashed Microsoft's Windows operating system, triggering a system failure that had far-reaching consequences. Thousands of flights were canceled or delayed, emergency services and court systems were disrupted, nonessential surgeries were postponed, and even New York City's Times Square billboards went dark. This raises serious questions about the resilience of our infrastructure and our overdependence on a few key players in the tech industry.

The Dangers of Centralization


When so much critical infrastructure relies on the efforts of a handful of companies, we risk putting our entire economy at risk. The reliance on major cloud vendors and their partners like CrowdStrike has created a scenario where a single faulty update can have devastating consequences. To mitigate this risk, we must:

  • Implement diverse technological solutions to reduce dependence on single vendors.
  • Foster competition among providers to encourage innovation and redundancy.
  • Conduct rigorous, independent testing of critical software components.
  • Additional testing and fail-safes for patch management.

Major Vendors' Efforts and Metcalfe's Law

Major vendors are spending millions to prevent such outages, investing heavily in security measures, redundancy, and fail-safes, but we continue to experience major outages. For instance, Microsoft has allocated significant resources to enhance its cybersecurity infrastructure (Microsoft, 2021), and Amazon Web Services (AWS) continuously invests in improving its resilience (Amazon Web Services, 2021). However, is it enough?

Applying Metcalfe's Law, which states that the value of a network is proportional to the square of the number of connected users or devices, we can see how interconnected our infrastructure has become. When a single vendor relies on multiple components, the network effect creates numerous opportunities for exploitation. Each additional component exponentially increases the potential points of failure, making our infrastructure more vulnerable to errors and cyber attacks. For example, the 2016 Dyn cyberattack exploited interconnected systems, leading to widespread internet outages (Zetter, 2016).

A Call for Robust Review and Redundancy

We must scrutinize critical software components with the same rigor as other critical infrastructure, ensuring they have:

  • Multiple layers of redundancy to prevent single points of failure.
  • Fail-safes to minimize the human impact errors or cyber-attacks.
  • Regular security audits to identify vulnerabilities.

The Need for Action

The incident at CrowdStrike serves as a critical warning. We cannot afford to be complacent and trust that a fix will always be deployed in time to mitigate damage. Instead, we must proactively develop and implement strategies to ensure continuity and resilience.

Final Thoughts

The outage caused by CrowdStrike's faulty update is more than just a technical glitch; it is a call to action. Our dependence on a few major players for critical software components is a vulnerability that errors or malicious actors can and will exploit. We must act now to build a more resilient and diversified technology infrastructure capable of withstanding and quickly recovering from such disruptions. My next post will explore a possible framework for local businesses to consider.

Comments, feedback, suggestions, and other viewpoints are always encouraged.

Wednesday, July 3, 2024

The 248th anniversary of the Declaration of Independence being adopted by the Second Continental Congress.

This week marks the 248th anniversary of the Declaration of Independence being adopted by the Second Continental Congress.

Over the years, the United States has grown from 13 colonies with around 2.5 million people to 50 states and 14 territories, now home to over 330 million people. Our economy has expanded to over $27 trillion. Public health advancements have drastically reduced child mortality rates from over 45% to less than 1%, and our average lifespan has increased by more than 35 years.

American scientific achievements have given us the light bulb, modern flight, the internet, air conditioning, movies, and the polio vaccine. We have over 2.7 million miles of power lines and more than 4 million miles of paved roads, with over 90% of households having access to broadband internet. The US has also led the way in space exploration, with more than 800 human spaceflights—more than any other country.

To the 15% on the radical right and the other 15% on the liberal left, remember we are better together as a nation when we focus on real issues and find ways to work together. While there are always challenges and room for improvement, I think we've made pretty good progress since our founding. So, grab a hot dog and your favorite drink—here's to the next 248 years.

The Fragility of Our Critical Infrastructure Exposed (Again)

Yesterday's global outage triggered by a faulty software update from CrowdStrike is a stark reminder of the fragile state of our critica...