ChatGPT

  • interacts in a conversational way
  • the model answers follow-up questions, challenges incorrect premises and reject inappropriate requests
  • Reinforcement Learning for Human Feedback
  • an initial model is trained using supervised fine-tuning: human AI trainers would provide conversations in which they played both sides, the user and an AI assistant
  • those people would be given the model-written responses to help them compose their response
  • This dataset was mixed to that of InstructGPT [3], which was transformed into a dialogue format