Weight Space Learning

Damian Borth, st.gallen
- paper
treat weights as data points - representation learning
can we look at networks → infer the latent factors from the weights?
is there knowledge inside models, but can be accessed when they are frozen?
loss surfaces and optimization problem of NN are non-convex
nn training optimization is very high dimensional
what is the relationship between the characterstics(behavior, performance etc) and their solution in weight space
GDPR : linked model to database
Hypothesis
- nn populate a structure in weight space
- structure contains info on properties and generating factors of the models
Encoder Decoder architecture for weight vectors
- then on to down-stream tasks
rather huge model zoo generated
- variated hyperparameters with Dropout
- magnitude Pruning
- two types of pre-training : supervised, contrastive
weight space is symmetric sometimes : ACG architecture
- multiple versions of NN which do the same thing → can be used to reach them from a space
contrastive loss
linear heads are fitted on the model zoos validation split
- encoder is frozen
initialization
- random normal
- glorot
- orthogonal
- he normal
- truncated normal
train and test on MNIST ,Fashion MNIST , CIFAR , SVHN
hypernetworks dont really work somehow?
train a encoder decoder transformer
- one forward pass destroys models
- took the weight vector and calculated MSE
  - layers that occupy more are re-constructed better (higher mean)
  - also what was apparently needed for Stable Difusion
  - paper : Taming Transformers for high res image synthesis
    - perception loss + GAN
sample space
- are they just sampling the train set? - some ablation done to prove it’s not lol
Sequential auto encoding of neural embeddings - SANE

Subhaditya's KB

Weight space learning

Weight Space Learning

Graph View

Backlinks