Non Relational Inductive Bias
- Activation Functions
- allow the model to capture the non-linearity hidden in the data
- Dropout
- helps the network avoid memorizing the data by forcing random subsets of the network to each learn the data pattern. As a result, the obtained model, in the end, is able to generalize better
- Weight Decay
- puts constraints on the model’s weights
- Batch Normalization , Layer Normalization , Instance Normalization
- Reduces Covariate Shift
- Augmentation
- Optimizers