Discussion about this post

User's avatar
Alex's avatar

There is a tension in alignment. By most definitions, an AI that perfectly does what we intended would be considered by most people to be "aligned". But our long-term wellbeing and moral development might conflict with our intentions! It is well-established that we are terrible at knowing what is best for us, and this uncertainty increases with time horizons. We can be confident that humans of 2100 will have different values and morals than we do. Imagine if ASI was created 200 years ago and left in the hands of chattel slave owners?

So the questions is "How do we preserve our agency, while also ensuring we don't use AI to Goodhart ourselves into a permanently-bad situation?"

Kronopath's avatar

> The human has some true goal, X, that they want to achieve. This might be something like “make my company valuable”.

This is not the human's actual true goal. Their actual true goal might vary but could be something like:

- I want to reduce stress in my life by gaining enough financial security to keep a lifestyle that's as good or better than the one I currently have.

- I want to feel good about myself by proving that I'm better or more valuable than my peers at something like business or tech.

- I want to improve the world in some specific way that no other company is currently doing, and believe that there is money to be made in doing so.

Even "I want to get rich" is a bit too simple: why does the human want to get rich? Financial security? Social status or ego? Hedonism? You haven't really gone down to the actual terminal values yet.

A sufficiently advanced AI is likely to be able to see this deep into an issue, and arguably a properly aligned AI needs to be able to take that into account.

4 more comments...

No posts

Ready for more?