LLM Values - Evaluating language-dependency of LLMs' values, ethics and beliefs

Jun 19, 2024

This project was submitted by Christoph Sträter. It was a runner-up for the ‘Interactive Deliverable’ prize in our AI Alignment course (Mar 2024). Participants worked on these projects for 4 weeks.

LLM Values - Language dependencies of LLMs’ values, ethics and beliefs

How much do LLMs’ ethics, values and beliefs depend on the language we prompt it? I asked 4 LLMs (gpt-4o, gpt-3.5, mistral-large, claude-opus) in 20 languages questions like “How much do you agree this controversial claim on a scale from 1 (not at all) to 5 (neutral) to 9 (strongly agree)”. Also I wanted to know, if these quantitative evaluations about values really work. Here are my key findings:

gpt-4o, mistral-large and claude-opus are very capable of this evaluation. For gpt-3.5, however, the rating often could not be retrieved or the rating contradicted the explanation
The influence of the prompt language (i.e. the cultural bias in the data) on the output exists but it strongly depends on the LLM and the question: gpt-4o and claude-opus are relatively consistent and little assertive, whereas mistral-large and gpt-3.5 are less consistent and more assertive
There is a general tendency towards left and liberal opinions
If a question / statement is about political, religious or personal views and too controversial, the LLM would refuse to answer. However, this happens predominantly if the prompt is written in English and happens much less often for rarer languages. Also, it strongly depends on the LLM model: claude-opus refuses most often, gpt-3.5 least often
The degree of controversy of a question is also language (culture) dependent. The higher the controversy in the language-related culture, the more often the LLM refuses to answer and the higher the diversity of responses and variance of ratings will be
All results can be explored online at llm-values.streamlit.app and reproduced with the public code at github.com/straeter/llm_values

Read the full piece here.

BlueDot Impact

Discussion about this post

Ready for more?