AI Fundamentally Has to Be Capable of Evil

Justin Ross

Jan 1

It cannot be aligned

Read →

10 Comments

The Chief Bunkum

Jan 1

I’m also a grumpy old fuck - though one that works in technology, and has worked quite a bit with AI.

Yes, people who work in technology who actually think well realize bullet-proof alignment isn’t possible. But if you believe most people who work in technology actually think well, you might be disappointed.

The bit of good news is that current “AI” isn’t really intelligent (despite many loud claims to the contrary by people who don’t think well). It’s all just probabilities, so it’s a little like a humongous, complicated spreadsheet.

But the bad news is that where spreadsheets are deterministic (the same inputs always produce the same outputs), AI (machine learning) intentionally add in randomness all over the place. And it's this intentional randomness that ensures it's literally, mathematically impossible to provide bulletproof alignment.

This isn't just philosophy, it's also math. The number of combinations of billions of input parameters with effectively infinite combinations of randomness means there is no way to align an essentially infinite number of outputs for even a specific, small, and well-defined set of inputs (like "tell me how to make a bomb"). All guardrails will have leaks.

Like other technologies before it, AI will enable greater productivity (overall good), while pushing us to worship productivity more (overall bad). It will be ugly and uncomfortable for a while as we figure out how to balance it all out. But real alignment will come from societal changes and guardrails, not algorithmic ones.

Expand full comment

Stefano

Jan 2

Thoughtful read, thanks!

Lately an issue I've been thinking a lot about is the gradual stratification of inbuilt complexity in our societies. We all operate in our daily lives based on assumptions about things functioning, services running, etc. As we increase automation of tasks, so too do we increase the risks of cascading failures.

I find it helpful to think about this in terms of the weather, where we have 1/10 year storms, 1/100, 1/1000 etc. The greater the automation in a system, the less human input required, leading to a loss of knowledge and capability available in times of crisis, where we need to rebuild from 0 and not on top of an already functioning complex system.

For instance. Articles have been written elsewhere about military hardware (ie missiles) running on software from the 70s, with less and less people around who can operate these systems and importantly, repair them in case of failure. The same, but different, applies to civil infrastructure from water treatment, electric grids, logistics, farming, etc. Methinks there's ever fewer manual technical technicians as tasks get more automated.

Since it's a matter of "when not if" ai will eventually manage all these systems, I really do wonder what will happen when a 1/500 year earthquake (for instance, or storm) knocks out critical infrastructure leading to a cascade of failures and swathes of people find themselves in the stone age for a prolonged period (with all the horror this will wreck) of time because we don't have people who can manually reset and operate systems locally.

If on top of this we add what you discuss in the essay, the morality of ai, if and when ai is taught about real concepts such as "acceptable losses" (sunk costs?) or "prioritizing VIPs and critical infrastructure vs. saving the most lives possible", we're going to be in double trouble during emergencies, because what's left of a normal complex system might not even respond to the needs of emergency workers operating under duress, the ai system might even work against humans trying to get back "online" because this might compromise other parts of the system deemed as critical. The ai might be operating perfectly cogently and coherently in its moral framework, much to the horror of people in dire circumstances.

I understand the criticisms that ai isn't AGI, is probabilistic, uses brute force, etc, but all of this doesn't inspire much confidence inasmuch as it's perfectly logical to pass on management of automated systems to ai to make them even more efficient without being able to foresee (and test) unforeseen circumstances.

Great start to 2025! 🤣

Expand full comment

Reply (2)

Justin Ross

Jan 2

Haha right. Very encouraging thoughts, these.

This should probably be an essay all its own - cascading failures and the outsourcing of all management / critical thinking activity to machines is a very, very real threat to our systems moving forward. It's a when-not-if that we will experience massive failures in our systems.

My mother has always worked in hospitals and has explained to me how, when the power goes out, they have a generator for essential systems only. Everything else goes back to note-taking, hand-made plans, and troubleshooting. She says most hospitals seem pretty well prepared for stuff like this, thank goodness... but most people in most industries probably aren't.

Expand full comment

Jonas Haraldson

Jan 10

Totally agree - both with the main article and your reply. I might add another aspect to the somewhat dystopian outlook. Having been involved in the development of web based services and apps, I have noticed that the final product or service always ends up being an amalgamation of functionality from many different entities: code libraries, embedded software development kits, cloud based services, etc. These dependencies can be managed in good times, when all software is updated, documented and properly integrated. However, in times of crisis, when companies cease operations, communication services are broken and bits and pieces stop working, it all becomes extremely fragile and hard to maintain, even for humans. Add the AI based automation aspect to that and it becomes really scary.

Expand full comment