IRT

assumes that every test item has a difficulty, and different items have different difficulties (Embretson & Reise, 2000)
Unfortunately, IRT generally assumes that test items are conditionally independent given the student’s competence. This is seldom true of the raw measures collected at the step level by tutoring systems
Although IRT has powerful features, such as calibration algorithms that empirically determine item difficulties and other parameters, considerable work is needed before it can be applied to tutoring systems.

Subhaditya's KB