Perplexity is defined as the exponentiated average negative log-likelihood of a sequence.
If we have a tokenized sequence X=(x0,x1,…,xt), then the perplexity of X is, PPL(X)=exp{−t1∑itlogpθ(xi∣x<i)}where logpθ(xi∣x<i) is the log-likelihood of the ith token conditioned on the preceding tokens x<i according to our model.
Intuitively, it can be thought of as an evaluation of the model’s ability to predict uniformly among the set of specified tokens in a corpus.
Importantly, this means that the tokenization procedure has a direct impact on a model’s perplexity which should always be taken into consideration when comparing different models.
This is also equivalent to the exponentiation of the Cross Entropy between the data and model predictions