Subhaditya's KB

❯

❯

Lisht

Sep 18, 20241 min read

architecture

Lisht

Derivatives
blog #Roam-Highlights
- Linearly Scaled Hyperbolic Tangent
- his activation function simply uses the Tanh function and scales it linearly, as follows
- $L i S H T (x) = x \times t anh (x)$
- Essentially, LiSHT looks very much like Swish in terms of the first-order derivative. However, the range is expanded into the negative as well, which means that the vanishing gradient problem is reduced even further - at least in theory.
- In their work, Roy et al. (2019) report based on empirical testing that indeed, the vanishing gradient problems is reduced compared to Swish and traditional Relu. Additional correlations between network learning and the shape of e.g. the LiSHT loss landscape were identified.

Graph View

Backlinks

Swish

Created with Quartz v4.3.1 © 2025

GitHub