inter-sentence coherence loss called sentence-order prediction (SOP) is used.
The key problem with this loss is that it merges topic prediction and coherence prediction into one task.
Intuitively, we can argue that topic prediction is much easier than coherence prediction. The consequence is that when the model discovers this, it can focus entirely on this subtask, and forget about the coherence prediction task; actually taking the path of least Resistance. The authors actually demonstrate that this is happening with the NSP task, replacing it within their work with a sentence-order prediction or SOP task.