For my Question Answering Kaggle competition, I wanted to experiment replacing the BERT model with RoBERTa. This means I need to reencode and retokenize the entire Natural Questions dataset into TFRecords. This process was already taking hours with the WordPiece tokenizer used for the