Multi Task Learning

  • multiple outputs (as desired), and typically a shared trunk of weights can indirectly encode common or shared knowledge
  • Can linearly combine loss for each task
    • is output head with weights
  • The exact scale of the weights does not matter as multiplying the loss by a positive scalar does not change the optimum.
  • [Hard Parameter Sharing](Hard Parameter Sharing.md)
  • [Soft Parameter Sharing](Soft Parameter Sharing.md)
  • augment
  • [Attribute Selection](Attribute Selection.md)
  • Eavesdropping
  • [Representation Bias](Representation Bias.md)