Non Relational Inductive Bias

Activation Functions
- allow the model to capture the non-linearity hidden in the data
Dropout
- helps the network avoid memorizing the data by forcing random subsets of the network to each learn the data pattern. As a result, the obtained model, in the end, is able to generalize better
Weight Decay
- puts constraints on the model’s weights
Batch Normalization , Layer Normalization , Instance Normalization
- Reduces Covariate Shift
Augmentation
Optimizers

Subhaditya's KB