RL environment with real Shopify product data — 45 products, 184 discoverable issues, shaped rewards
8 issues 路 25 steps
Issues listed with suggested commands. Agent fills in params.
12 issues 路 35 steps
Issue descriptions shown. Agent picks commands & params.
20 issues 路 50 steps
Only category counts. Agent must discover issues itself.
/health — health check/tasks — enumerate tasks & graders/reset — reset (body: {})/step — action (body: {"action":{"command":"...","params":{}}})/state — current state/schema — action/observation schemas/grade — run grader on sample/ws — persistent WebSocket session