Member-only story

What does the tokenizer do for a language model?

2 min readSep 22, 2023

This post is about prepping for interviews for Data Science roles. The original post can be found here.

Welcome to today’s data science interview challenge! Today’s challenge is inspired by a Huggingface Transformer Lecture (2022 version) at Stanford! Relax!

A warm up question 🤓:

See if you can tell me (without writing down) what the code looks like that creates a torch.tensor with the following contents:

Now tell me what the code look like to compute the average of each row (.mean()) and each column. What's the shape of the results?

I usually don’t do live coding questions but this one is straightforward and you should be able to speak while thinking. Have fun!

Now back to the basics:

Question: What does the tokenizer do for a language model?

Source: Paper

Here are some tips for readers’ reference:

Warm up Question :

Is the following what you are envisioning?

What does the tokenizer do for a language model?

Warm up Question :

Written by Angelina Yang

No responses yet