Maximum likelihood criterion
Since each observed output should have high probability under its corresponding distribution , we can choose the model params so they MAXIMIZE the combined probability across all I training examples

- Assumptions
- data are identically distributed

- independent and identically distributed