wgryc/phasellm
A Python framework for evaluating and comparing large language models from OpenAI, Cohere, Anthropic and other providers with standardized APIs and evaluation workflows.

PhaseLLM is a framework for evaluating and managing LLM-driven products and experiences. It standardizes API calls across multiple LLM providers (OpenAI, Cohere, Anthropic) to enable model comparison and benchmarking. The framework includes evaluation tools to assess output quality and automations to use advanced models like GPT-4 to evaluate simpler models, helping users optimize prompts and model selection based on performance and cost.