Evaluating the Performance of vLLM: How Did It Do?

Evaluating the Performance of vLLM: How Did It Do? by @textmodels

6 Evaluation

In this section, we evaluate the performance of vLLM under a variety of workloads.

6.1 Experimental Setup

Model and server configurations. We use OPT [62] models with 13B, 66B, and 175B parameters and LLaMA [52] with 13B parameters for our evaluation. 13B and 66B are popular sizes for LLMs as shown in an LLM leaderboard [38], while 175B is the size of the famous GPT-3 [5] model. For all of our experiments, we use A2 instances with NVIDIA A100 GPUs on Google Cloud Platform. The detailed model sizes and server configurations are shown in Table 1.

Workloads. We synthesize workloads based on ShareGPT [51] and Alpaca [50] datasets, which contain input and output texts of real LLM services. The ShareGPT dataset is a collection of user-shared conversations with ChatGPT [35]. The Alpaca dataset is an instruction ...

Copyright of this story solely belongs to hackernoon.com . To see the full text click HERE

6 Evaluation

6.1 Experimental Setup

Share: