The parameters are constrained to take values between zero and one, and they must collectively sum to one to ensure a valid probability distribution Pr(y=k)=λk
Network computes K params from input x
Since the output of the network does not conform to the format, we use Softmax so results are positive, and the K numbers sum to one