Maximum likelihood criterion

 

Since each observed output should have high probability under its corresponding distribution , we can choose the model params so they MAXIMIZE the combined probability across all I training examples

  • Assumptions
    • data are identically distributed
    • independent and identically distributed