How do I speed up my Tensorflow Transformer models?
Transformer models have gained much attention in recent years and have been responsible for many of the advances in Natural Language Processing (NLP). Transformer models have often replaced Recurrent Neural Networks for many use cases like machine translation, text summarization, and document classification. For organizations, it can be challenging to deploy transformer models in production and perform inference because inference can be expensive, and the implementation can be complex. Recently we announced the public preview for a new runtime that optimizes serving TensorFlow (TF) models on the Vertex AI Prediction service. We are happy to announce that the optimized Tensorflow runtime is now GA. The optimized Tensorflow runtime generally results in faster predictions and better throughput than most open source based pre-built TensorFlow serving containers.
In this post, you learn how to deploy a fine-tuned T5x base model to the Vertex AI Prediction service using the optimized TensorFlow...
Copyright of this story solely belongs to google.com. To see the full text click HERE