Member-only story
Tools for Data Scientists, By Data Scientists
A few years ago my entrepreneur friend from Google asked me what are the best tools for data scientists that I can think of. At the time we “predicted” that one of the next rounds of burgeoning startups would be tools for data scientists. Looking back a few years later, we were quite spot on!
However, I haven’t found my favorite application that can solve all the problems very easily all at once (maybe I’m being greedy 🤓). The hard part is to even envision a one-stop shop solution that can exist. You may argue that it’s impossible given how complex the data science solutions can be. One of our previous posts also talked about the landscape of data science tech stack evolving to more akin to software development.
As a leader of data science organizations, my goal is to simplify the DS tech stack as much as possible without losing (much of the) productivity or quality. So there are compromises.
I recently researched Nbdev built by Hamel Husain, Jeremy Howard and Wasim Lorgat, and definitely impressed by it!
What is Nbdev?
The punchline on their site says:
Create delightful software with Jupyter Notebooks
Write, test, document, and distribute software packages and technical articles — all in one place, your notebook.
Wow! Finally! There’s a tool I can use to stay within the comfort zone of notebooks and do more!
It takes your Jupyter Notebook and turns it into a complete software package.
More specifically, as Hamel puts it,
- “It generates a searchable documentation site.
- It offers continuous integration with Github Actions.
- It offers an amazing way to do unit tests and testing all within the same context.”
What are the perks?
Jeremy mentioned the doubt from software developers about using notebooks for “serious” software development. But as a data scientist, I really like the idea that I can “develop softwares” using a Jupyter Notebook!
Nbdev is a notebook-driven development platform. It gives the data scientists full authorship of models including the high-quality documentation, tests, continuous integration and packaging. It also promotes software engineering best practices among data scientists because its tests…