Index page

Google Keyboard Federated Learning(Notes only. refer to [35] for code)

[34] Google Keyboard Federated Learning(Notes only. refer to [35] for code) - Chen, M., Mathews, R., Ouyang, T., & Beaufays, F. (2019). Federated learning of out-of-vocabulary words. arXiv preprint arXiv:1903.10635. Paper

Paper notes

character-level recurrent neural network is able to learn out- of-vocabulary
High-frequency words can be sampled from the trained generative model by drawing from the joint posterior directly
Studied using 1. Simulation, 2. Gboard App
Learning frequently typed words
Words missing from the vocabulary cannot be predicted on the keyboard suggestion strip
neural machine translation (NMT), rely on a vocabulary to encode words during endto-end training
uploading only ephemeral model updates to the server for aggregation, and leaving the users’ raw data on their device.
the privacy risk of unintended memorization still exists
We further show that the top sampled words are very meaningful and are able to capture words we know to be trending in the news at the time of the experiments

LSTM Modeling

In this work we use a variant of LSTM with a Coupled Input and Forget Gate (CIFG) (Greff et al., 2017), peephole con- nections (Gers and Schmidhuber, 2000) and a pro- jection layer (Sak et al., 2014)
Cross entropy loss
This parallel sampling approach avoids the dependency between each sampling thread, which might occur in beam search or shortest path search sampling
Adaptive L2 -norm clipping is performed on each client’s gradient, as it is found to improve the robustness of model convergence
The larger model (on training with Reddit data) does not lead to significant gains. Momentum and adaptive clipping lead to faster convergence and more stable performance.

Conclusion

We also perform live experiments with on-device data from 3 populations of Gboard users and demonstrate that this method can learn OOV words effectively in a real-world setting.

Home page

Total posts : 86

Google Keyboard Federated Learning(Notes only. refer to [35] for code)

Paper notes

LSTM Modeling

Conclusion