virtually no papers demonstrating cross-dataset generalization, e.g. training on ImageNet, while testing on PASCAL VOC
if our datasets were truly representative of the real world, this would be a very easy thing to do, and would give access to more of the much needed labelled data
But from our perspective, all the datasets are really trying to represent the same domain – our visual world – and we would like to measure how well or badly they do it.
Overall the results look rather depressing, as little generalization appears to be happening beyond the given dataset