Maximum likelihood criterion
Since each observed output should have high probability under its corresponding distribution , we can choose the model params so they MAXIMIZE the combined probability across all I training examples
- Assumptions
- data are identically distributed
- independent and identically distributed