which seems to have the problem of local optimization
HyTAS seems to test specific variants of the same architecture : increasing depth/width. I think the reviewer meant that it does not test different kinds of transformer architectures → more branches/more concatenations etc.
they thought it was a general metric to evaluate any kind of transformer. this is not the case. perhaps a clarification would be nice?
more detailed optimization
architecture diagram
General Comments
ZICO → ZICO++ ?
difference between CNNs and transformers and why this architecture specifically?