The main approach used today to ensure AI systems behave in accordance with human preferences is Reinforcement Learning from Human Feedback (RLHF).
What is Recursive Reward Modelling?
The main approach used today to ensure AI systems behave in accordance with human preferences is Reinforcement Learning from Human Feedback (RLHF).