← all repositories

liltom-eth/llama2-webui

A Gradio-based web interface for running Llama 2 models locally on GPU or CPU with support for multiple quantization formats and inference backends.

1.9k stars Jupyter Notebook Language ModelsInference · Serving
llama2-webui
Velocity · 7d
+1.8
★ / day
Trend
steady
star history

This project provides a user-friendly web UI (built with Gradio) to run Llama 2 models locally on any platform. It supports various model sizes (7B, 13B, 70B) and quantization formats including GPTQ, GGML, and GGUF. Multiple inference backends are supported such as transformers, bitsandbytes for 8-bit, AutoGPTQ for 4-bit, and llama.cpp. The project also exposes an OpenAI-compatible API for integration with other applications and agents.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.