Say Goodbye to Tokens, and Say Hello to Patches
hackernoon.comBlT is a new way to scale language models. Instead of pre-defining tokens, it looks at the raw bytes of text and groups them based on how predictable they are. BLT models with patch sizes 6 and 8 quickly surpass Llama 2 and 3.
Do we really need to break text into tokens, or could we work directly with raw bytes?
First, let’s think about how do LLMs currently handle text. They first chop it up into chunks called tokens using rules about common word pieces. This tokenization step has always been a bit of an odd one out. While the rest of the model learns and adapts during training, tokenization stays fixed, based on those initial rules. This can cause problems, especially for languages that aren’t well-represented in the training data or when handling unusual text ...
Copyright of this story solely belongs to hackernoon.com . To see the full text click HERE