Multi Task Learning
- multiple outputs (as desired), and typically a shared trunk of weights can indirectly encode common or shared knowledge
- Can linearly combine loss for each task Li
- L(x,y,θ)=ΣiwiLi(fi(x),y,θi)
- fi(x) is output head with weights θi
- The exact scale of the weights does not matter as multiplying the loss by a positive scalar does not change the optimum.
- [Hard Parameter Sharing](Hard Parameter Sharing.md)
- [Soft Parameter Sharing](Soft Parameter Sharing.md)
- augment
- [Attribute Selection](Attribute Selection.md)
- Eavesdropping
- [Representation Bias](Representation Bias.md)