superlinked/sie
An open-source inference server and production cluster that serves 85+ embedding, reranking, and extraction models via a unified API.

SIE (Superlinked Inference Engine) is a production-grade inference server and cluster for embeddings, reranking, and entity extraction. It supports 85+ pre-configured models spanning dense, sparse, multi-vector, vision, and cross-encoder architectures, with hot-swappable model loading and LRU eviction. The system includes a load-balancing gateway, KEDA autoscaling, Grafana dashboards, and Terraform deployment for Kubernetes, while integrating natively with LangChain, LlamaIndex, Haystack, DSPy, CrewAI, Chroma, Qdrant, and Weaviate.