Dilated Sliding Window Attention Analgous to dilated CNN Assuming a fixed d and w for all Layers, Receptive field is l×d×w which can reach tens of thousands of tokens even with small values of d