liltom-eth/llama2-webui
A Gradio-based web interface for running Llama 2 models locally on GPU or CPU with support for multiple quantization formats and inference backends.

This project provides a user-friendly web UI (built with Gradio) to run Llama 2 models locally on any platform. It supports various model sizes (7B, 13B, 70B) and quantization formats including GPTQ, GGML, and GGUF. Multiple inference backends are supported such as transformers, bitsandbytes for 8-bit, AutoGPTQ for 4-bit, and llama.cpp. The project also exposes an OpenAI-compatible API for integration with other applications and agents.