ML systems can run to completion without throwing any errors (indicating functional correctness) but can still produce incorrect outputs (indicating behavioral Issues)
CheckList
model-agnostic and task-agnostic methodology for testing NLP models inspired by principles of behavioral testing
matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation
Minimum Functionality Test (MFT): A Minimum Functionality Test (MFT) uses simple examples to make sure the model can perform a specific task well. For example, they might want to test the performance of a sentiment model when dealing with negations
Invariance Test: Besides testing the functionality of a model, they might also want to test if the model prediction stays the same when trivial parts of inputs are slightly perturbed. These tests are called Invariance Tests (IV)
Directional Expectation Test: In the Invariance Test, they expect the outputs after the perturbation to be the same. However, sometimes they might expect the output after perturbation to change. That is when Directional Expectation Tests comes in handy