Large Language Models (LLMs) can sometimes produce dangerous or harmful outputs, such as instructions for dangerous activities (like building weapons), inappropriate or offensive content, or misinformation.
What is input/output filtering in AI safety?
Large Language Models (LLMs) can sometimes produce dangerous or harmful outputs, such as instructions for dangerous activities (like building weapons), inappropriate or offensive content, or misinformation.