janhq/cortex.cpp

A C++-based local AI API platform that serves language, vision, and speech models using llama.cpp and ONNX runtimes.

★2.8k stars C++ Inference · Serving Language Models

View on GitHub ↗ Homepage ↗

Velocity · 7d

+2.8

★ / day

Trend

→steady

star history

Cortex.cpp is an open-source local inference server that provides a REST API for running AI models on consumer hardware. It wraps llama.cpp and ONNXRuntime to support GGUF quantized models and ONNX-format neural networks across vision, speech, and language modalities. The project provides platform installers for Windows, macOS, and Linux to simplify deployment of local AI inference.