The [GPT2BPETokenizer](https://github.com/ludwig-ai/ludwig/blob/00c51e0a286c3fa399a07a550e48d0f3deadc57d/ludwig/utils/tokenizers.py#L1085) is using torchtext. We want to remove torchtext as a dependency so this Tokenizer has to be refactored not using it.