Test sets/New
Name it, describe what it covers. You'll add cases on the next screen.
Injected into judge / cluster / compare prompts so eval generalizes beyond customer support.