Is guidellm open source?

Yes — vllm-project/guidellm is open source, released under the Apache-2.0 license.

What language is guidellm written in?

vllm-project/guidellm is primarily written in Python.

How popular is guidellm?

vllm-project/guidellm has 1.4k stars on GitHub.

Where can I find guidellm?

vllm-project/guidellm is on GitHub at https://github.com/vllm-project/guidellm.

← all repositories

vllm-project/guidellm

A load-testing tool that actually understands LLMs

GuideLLM benchmarks inference servers using token-level metrics and multimodal workloads instead of treating them like generic HTTP endpoints.

★1.4k stars Python LLMOps · Eval Inference · Serving

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

GuideLLM simulates real-world traffic against OpenAI-compatible or vLLM-native inference servers, then reports on the metrics that matter for language models: time-to-first-token (TTFT), inter-token latency (ITL), throughput distributions, and per-request token counts. It supports text, image, audio, and video inputs, draws from HuggingFace datasets or synthetic generators, and exports results as JSON, CSV, or HTML for regression tracking.

The interesting bit

Most load-testing tools measure HTTP latency and call it a day. GuideLLM treats token generation as a first-class concern, capturing the statistical distributions that determine whether a chatbot feels snappy or sluggish. The comparison table in the README is refreshingly honest: even other LLM-focused tools often lack APIs, multimodal support, or full metrics collection.

Key highlights

Six traffic profiles including synchronous, concurrent, rate-based, and sweep modes for finding operational limits
Multimodal benchmarking across chat completions, transcription, and translation endpoints
Both CLI and programmatic API for integration into existing pipelines
Container image available for isolated runs
Refactored architecture aimed at extensibility for additional backends and output formats

Caveats

Active development areas (synthetic multimodal datasets, multi-turn conversations, speculative decoding views) are not yet available
macOS and Linux only; Windows support is not mentioned
The README’s “Quick Start” leans heavily on vLLM as the example server, though any OpenAI-compatible endpoint works

Verdict

Worth a look if you operate LLM inference in production and need more than “requests per second.” Skip it if you’re just running occasional curl tests against a stable endpoint.

Frequently asked

What is vllm-project/guidellm?: GuideLLM benchmarks inference servers using token-level metrics and multimodal workloads instead of treating them like generic HTTP endpoints.
Is guidellm open source?: Yes — vllm-project/guidellm is open source, released under the Apache-2.0 license.
What language is guidellm written in?: vllm-project/guidellm is primarily written in Python.
How popular is guidellm?: vllm-project/guidellm has 1.4k stars on GitHub.
Where can I find guidellm?: vllm-project/guidellm is on GitHub at https://github.com/vllm-project/guidellm.