Member-only story
What’s stopping the LLMs from generating a Novel?
This post is about prepping for interviews for Data Science roles. The original post can be found here.
Welcome to today’s data science interview challenge! Today’s challenge is inspired by recent CS25 Stanford Seminar talking about Transformers with Andrej Karpathy!
Here you go:
Question 1: What’s stopping the current language models from generating a Novel?
Question 2: How do self-attention and cross-attention differ?
Here are some tips for readers’ reference:
Question 1:
The token length limitation in current large language models (LLMs) like GPT-3 and similar models is one of the significant constraints that prevent them from generating complete novels or very long texts. Token length refers to the number of individual words, subwords, or characters in a text sequence. Each language model has a predefined maximum token limit, which is a fundamental limitation due to computational and memory constraints. The token limit for models like GPT-3 was typically around 4096 tokens. As of August, 2023, GPT-4 boasts a 32K token context window, accommodating inputs, files, and…