What’s the Difference between ML in Research vs in Production?

Angelina Yang
4 min readMay 11

There are a lot of explanations elsewhere, here I’d like to share some example questions in an interview setting.

What’s the difference between ML in research vs. ML in production?

Source: Blog post

Here are some tips for readers’ reference:

This is an open-ended question. An alternative question in an interview can be “how do you deploy this model in production?” Most data scientists are familiar with the model training life cycle and are less familiar with the deployment process.

Source: CS329 Stanford
  1. Goals and Objectives:
  • ML in Research: In the research phase, the primary goal is to explore and develop new algorithms, models, or approaches to solve specific problems. The focus is often on pushing the boundaries of ML techniques, discovering novel methodologies, and publishing research papers. The emphasis is on innovation and achieving high accuracy or performance on benchmark datasets.
  • ML in Production: In the production phase, the primary goal is to deploy ML models into real-world systems or applications to solve practical problems and deliver value to users or customers. The focus is on building robust, scalable, and efficient ML systems that can handle large-scale data, operate reliably, and integrate seamlessly with existing software infrastructure. The emphasis is on stability, efficiency, and maintainability.

2. Data Availability:

  • ML in Research: Researchers often work with publicly available datasets or small-scale datasets specific to their research question. The data is often static. They may focus on gathering or curating specific datasets to evaluate their algorithms or models. The emphasis is on designing experiments and evaluating performance in controlled settings.
  • ML in Production: In production, ML systems often deal with large volumes of real-world data from diverse sources. The data is often messy and constantly shifting. Data pipelines and preprocessing steps are crucial to handle data ingestion, cleaning, transformation…