'Speed up text generation of GPT-2 simple
Hi I am trying to generate a 20 token text using GPT-2 simple. It is taking me around 15 seconds to generate the sentence. AI Dungeon is taking around 4 seconds to generate the same size sentence. Is there a way to fasten the GPT-2 text generation?
Solution 1:[1]
I think they have quicker results because their program is better optimized and they have greater computing power. They pay a lot for server. As well, Ai Dungeon uses GPT-3 which might be just faster. I'm as well struggling with speed of GPT-2. Let me know if you figured anything. Cheers
Solution 2:[2]
Text generation models like GPT-2 are slow, and it is of course even worse with bigger models like GPT-J and GPT-NeoX.
If you want to speed up your text generation you have a couple of options:
- Use a GPU. GPT-2 doesn't require too much VRAM so an entry level GPU will do. On a GPU, generating 20 tokens with GPT-2 shouldn't take more than 1 second.
- Quantize your model and convert it to TensorRT. See this good tutorial: https://github.com/NVIDIA/TensorRT/tree/main/demo/HuggingFace/GPT2
- Serve it through a dedicated inference server (like TorchServe or Triton Inference Server).
I actually wrote an article about how to speed up inference of transformer based models. You might find it helpful: how to speed up deep learning inference
Solution 3:[3]
You can use the OpenVINO optimized version of GPT-2 model. The demo can be found here. It should be much faster as it's heavily optimized.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Michal Aleksandrowicz |
Solution 2 | Julien Salinas |
Solution 3 | dragon7 |