How to Test NLP Models from the Lens of Software Engineering?
Today we would like to get some tips on what to think about before we launch a NLP model to production.
Once we built our new NLP model, beat some benchmarks, what should we do next? When should we pull the trigger to use it in a real production setting?
Microsoft Researcher and Assistant Professor at the University of Washington, Marco Tulio Ribeiro, talked about how to think about testing NLP models in preparation for a successful launch, similar to how we are used to thinking about testing softwares using unit test, regression test, and integration test.
Principal #1: Testing small units (unit test)[1]
In case of a NLP model: Go beyond looking at the accuracies of a test dataset, and test specific “linguistic capabilities”, such as the following:
- Vocabulary/POS
- Named Entities
- Negation
Principal #2: Behavioral testing (decoupling tests from implementation)
In case of a NLP model: Decouple training from testing dataset, and test behaviors with test types such as:
- Minimum Functionality Test (MFT): You can create simple MFT for negation in a sentiment analysis model, using very simple examples to test if your model gets them right.