NVIDIA-NeMo/Gym
A library for evaluating and improving AI agents and language models across scalable, stateful environments with shared benchmarks and verifiers.

NeMo Gym provides infrastructure to develop environments for AI agents, run scalable evaluation and training, and access a collection of benchmark environments. An environment comprises a dataset of tasks, an agent harness defining model-world interaction, a verifier for scoring task completion, and per-task execution state. It supports transitioning between evaluation, agent optimization, and training workflows at scale.