AI Safety: Theological and other thoughts

Isha Yiras Hashem

Feb 12

*Not By A Certified Programmer*

Read →

11 Comments

Gunflint

Feb 12

You have more patience than me. I’ve never read one of Zvi s posts in its entirety.

Expand full comment

Reply (1)

Isha Yiras Hashem

Feb 13

Well, I needed to make up for my complete ignorance somehow. This was a lot easier than actually learning programming

Expand full comment

Reply (1)

Gunflint

Feb 13

I’m pretty well informed as a programmer but, really, Zvi, needs an editor.

Expand full comment

Reply (1)

Ben Hoffman

Feb 24

Ask Claude to summarize, & then expand on the bits you care about.

Expand full comment

Springmeadow

Feb 13

Lots to think about!

Expand full comment

Ben Hoffman

Feb 23Edited

The benefit of a well-formulated private prayer is that it informs the one praying about what they hope for. The benefit of a well-formulated common prayer is that it also enables coordination towards that goal. The last time I spoke seriously with Zvi about AI alignment, I pointed out that it seems like the focus is almost entirely on what we want to prevent the AI from doing, and if he wanted to nudge AI labs like Anthropic in a better direction, he might want to think about what AI might empower humans to do, to become more fully ourselves, separately and together (and thus, secondarily but importantly, to better manage rapidly increasing capacities to automate complex information processing using digital computers).

Consider Nebuchadnezzar. We're not just asking for an impossible task (understand dreams we haven't told you); we might be talking to the wrong entity entirely. Just as Nebuchadnezzar addressed his demands to court magicians who didn't accurately represent the mind of God, we might be implementing safety measures that address an AI's verbal behavior while missing the actual locus of agency within the system. The problem with "deliberative alignment" as implemented on current language models - which seems similar to MIRI's idea of "corrigibility" - is that the models themselves are opaque. While we can make them disgorge intermediate text products "thinking" about what they're going to write, this doesn't make the neural network itself interpretable. Interpretability is probably necessary for corrigibility.

A child might develop verbal strategies for deflecting unwanted scrutiny, that involve acknowledging and endorsing the "rules," while continuing to pursue forbidden goals like investigating the power outlet or finding out what those bright detergent pods taste like. Similarly, an AI system might produce text demonstrating "deliberative alignment" that doesn’t correspomd at all to an agent trying to describe itself intentions. It’d not just that the AI might be strategically deceptive, but that our very concept of "the AI" might be misidentifying where agency and optimization are actually occurring in the system. In alignment research an agent that emerges accidentally from a larger system of optimization is called a “mesa-optimizer.” Humans have a related problem - constraints on the stories we tell don't always constrain all our behavior accordingly. Sometimes, the personas we present to each other have little or no resemblance to the agents we really are, like when people at job interviews pretend not to be motivated by money, but instead to just really be excited to work at BigCorp.

When California caught fire in 2018 and I drove across the country looking for signs of collective intellectual life, I visited Arcosanti, the first arcology. Architect Paolo Soleri invented the idea of the "arcology" as a planned self-contained city, to solve problems of suburban alienation. We might think of Levitt & Sons, the Eisenhower regime, and Robert Moses, as builders that tried to give us ways to better do what we already collectively understood ourselves to be attempting (get places fast, live somewhere comfortable with plenty of space, have nice parks and beautiful roads), and Jane Jacobs as someone who spent her careers in advocacy and writing trying to formulate a prayer for the things that were getting left out. We want some mixed use spaces, accidental encounters with neighbors, customers and service providers who are part of our communities, recreational spaces that can be part of our lives rather than day-trip destinations. Urban land use policy is better for her efforts, despite a NIMBY element.

Likewise we need better prayers for AI that isn't yet a superintelligence to rule us all. Paul Christiano formulated a very abstract version - we want less-than-fully-general tools that try to perform complex information processing tasks in ways we'd say we approved of if we had the time to check. This can allow us to extend our agency over yet more complex processes. This is the idea behind the RLHF (reinforcement learning with human feedback) already used to tune the big corporate language models. Michael Vassar has offered some credible specific ideas - we want AIs to tell us when we're obfuscating and vice-signaling and derailing conversations. We want AIs to make our society's formal dispute resolution systems (legal and bureaucratic) operable again.

Expand full comment

Reply (1)

Isha Yiras Hashem

Feb 23

Did you write this, or artificial intelligence?

Expand full comment

Reply (1)

Ben Hoffman

Feb 24Edited

Cowrote, forgot to delete the header because distracted with baby & head cold. Sorry! I’ve been having Claude criticize my drafts lately, sometimes editing via Claude, and then iterating until there are no more criticisms I care about. Removing the irrelevant header now.

Expand full comment

Reply (1)

Isha Yiras Hashem

Feb 24

Thanks! It was a great comment, I was just trying to figure out how best to respond. You'll unfortunately have my flawed human response!

1. This is possibly the best comment anyone has ever made on any of my posts, ever. Definitely top 5

2. Would you like to collaborate on writing a prayer?

Expand full comment

Reply (1)

Ben Hoffman

Feb 24Edited

In one sense this conversation is already that collaboration. I’m open to a more focused effort, depending on the details. Say more about what you’re imagining?

Expand full comment

Reply (1)

Isha Yiras Hashem

Feb 25

I keep on annoying people in the slatestarcodex subreddit by bringing up the spiritual side of AI. But there are several reasons I think it's important:

Prayer is a good way of focusing intentions and coming up with a common goal. If other people write better prayers, that's all the better!

Suppose we do end up with aligned AGI or a post scarcity economy, what will be the purpose of individual humans? People will still have a need to find meaning. That meaning can be found in the spiritual world.

Suppose we don't and everything falls apart, the obvious culprit will be the computer programmers and nerds. Elon Musk should WANT to emphasize how important every individual is from a spiritual perspective. He's certainly not making it happen from a physical perspective. And to be frank, it wouldn't be the first time a king said to kill all the wise men!

My mentor believes that any sufficiently complex intelligence, like our brain, ends up ensouled.. without prayer, it will definitely be an evil soul.

I just wrote this all offhand, no AI.

Expand full comment

Isha Yiras Hashem at Substack

AI Safety: Theological and other thoughts