Weight Space Learning
- Damian Borth, st.gallen
- treat weights as data points - representation learning
- can we look at networks → infer the latent factors from the weights?
- is there knowledge inside models, but can be accessed when they are frozen?
- loss surfaces and optimization problem of NN are non-convex
- nn training optimization is very high dimensional
- what is the relationship between the characterstics(behavior, performance etc) and their solution in weight space
- GDPR : linked model to database
- Hypothesis
- nn populate a structure in weight space
- structure contains info on properties and generating factors of the models
- Encoder Decoder architecture for weight vectors
- then on to down-stream tasks
- rather huge model zoo generated
- weight space is symmetric sometimes : ACG architecture
- multiple versions of NN which do the same thing → can be used to reach them from a space
- contrastive loss
- linear heads are fitted on the model zoos validation split
- initialization
- random normal
- glorot
- orthogonal
- he normal
- truncated normal
- train and test on MNIST ,Fashion MNIST , CIFAR , SVHN
- hypernetworks dont really work somehow?
- train a encoder decoder transformer
- one forward pass destroys models
- took the weight vector and calculated MSE
- sample space
- are they just sampling the train set? - some ablation done to prove it’s not lol
- Sequential auto encoding of neural embeddings - SANE