How to Test NLP Models from the Lens of Software Engineering?

3 min readMay 17, 2022

Today we would like to get some tips on what to think about before we launch a NLP model to production.

Once we built our new NLP model, beat some benchmarks, what should we do next? When should we pull the trigger to use it in a real production setting?

Microsoft Researcher and Assistant Professor at the University of Washington, Marco Tulio Ribeiro, talked about how to think about testing NLP models in preparation for a successful launch, similar to how we are used to thinking about testing softwares using unit test, regression test, and integration test.

Principal #1: Testing small units (unit test)[1]

In case of a NLP model: Go beyond looking at the accuracies of a test dataset, and test specific “linguistic capabilities”, such as the following:

Vocabulary/POS
Named Entities
Negation

Principal #2: Behavioral testing (decoupling tests from implementation)

In case of a NLP model: Decouple training from testing dataset, and test behaviors with test types such as:

Minimum Functionality Test (MFT): You can create simple MFT for negation in a sentiment analysis model, using very simple examples to test if your model gets them right.

How to Test NLP Models from the Lens of Software Engineering?

Principal #1: Testing small units (unit test)[1]

Principal #2: Behavioral testing (decoupling tests from implementation)

Written by Angelina Yang