replace traditional convolutional and pooling layers with a more biologically inspired architecture that better captures the spatial relationships between objects in an image
idea that the human visual system is composed of a hierarchy of “capsules” that process visual information at different levels of abstraction.
Each capsule comprises a group of neurons sensitive to specific features of an image, such as the presence of an edge or a particular shape.
These features are then combined and passed up the hierarchy to higher-level capsules, which extract more abstract concepts such as the identity of an object or the presence of a face.
Capsule Networks are able to capture better spatial relationship.
Capsule Networks uses better Downsampling methods which do not cause loss of information seen in CNNs.
Capsule Network perform much better than CNNs but are more computationally epensive.
Drawbacks of pooling layers
pooling layers, which down-sample the input image and can lead to the loss of important information about the spatial relationships between objects in the image.
Capsule networks aim to overcome this limitation by using a different down-sampling mechanism that preserves more spatial information.