the model answers follow-up questions, challenges incorrect premises and reject inappropriate requests
Reinforcement Learning for Human Feedback
an initial model is trained using supervised fine-tuning: human AI trainers would provide conversations in which they played both sides, the user and an AI assistant
those people would be given the model-written responses to help them compose their response
This dataset was mixed to that of InstructGPT [3], which was transformed into a dialogue format