I’ve recently had the opportunity to reorganize the essential takeaways regarding CI/CD for MLOps, drawing from valuable insights shared by my friend Hamel Husain in his recent course and blog post.
Today, I’m excited to pass on these insights to you. I hope this information proves helpful, particularly for data scientists transitioning into machine learning deployment, especially those without an engineering background.
In my experience with various analytics and data science individuals and teams, I have observed that the deployment pipeline is often not given much emphasis. At times, we rely on basic, manual pipelines, while in other instances, we’re fortunate to have dedicated MLOps teams to handle the development pipeline handoff.
Consider a statistician on my team who has extensive expertise in Survival Analysis and Experiment Design but may lack the knowledge or experience required to efficiently set up a deployment pipeline.
Now, picture yourself in that situation. Wouldn’t it be nice to aspire to reach the next level and be capable of constructing an end-to-end system on your own?
Between software development and ML development, the key difference in CI/CD is whether it’s just about the code, or beyond the code.
In traditional software development, CI/CD automates many tasks including testing, building, and deploying software. In this traditional software regime, CI/CD is often triggered through changes in code.
However, CI/CD for ML is different. Testing and deployment of ML can be triggered by many types of events in addition to changes to code, such as new data or labels, drift in model(s) or data, on a fixed cadence (daily/weekly model re-training), etc.
The following table illustrated these key differences:
The following example is an adaptation of Jacopo Tagliabue’s reference project “You don’t need a bigger boat”, which illustrates an end-to-end…