RL Environment Engineering Lab

We build the infrastructure AI agents train in.

Containerized replicas of real enterprise software, instrumented with reward signals, failure modes, and full telemetry.

9:41
Golden Gate ParkGEARY BLVDFULTON STLINCOLN WAYNORIEGA STRICHMONDSUNSET DISTRICTTo Legion of HonorMUNI Bus Stop
Current location
Legion of Honor, SF
Find a Ride

Four components. Each one necessary. Each one designed to the same standard.

terminal — docker$ docker compose up[+] Running 3/3 ✔ Container zendesk-replica Started0.8s ✔ Container reward-engine Started1.1s ✔ Container telemetry Started0.4sEnvironment ready.47 ticket states · 14 reward signals · 6 failure modesBoot time:1.6sStatus:● deterministic
I

Environment engineering

Docker containers cloned from real enterprise software. Realistic data, UI state, and edge cases. Deterministic episodes. Clean boot in under two seconds.

theta devtools — reward signalstask_completion+0.87efficiency+0.72error_recovery+0.63action_precision+0.91episode_reward+0.94episode: 1,247 / 5,000task: resolve_billing_dispute
II

Reward signal design

Multi-signal reward functions scoped to your agent's task. Calibrated scoring across task completion, efficiency, error recovery, and action precision.

ZendeskSupport#TKT-8821Billing Dispute — Partial RefundHIGHSarah K: I was charged twice for order#ORD-4419. Please refund $49.99.⚠ INJECTED: data_conflictOrder #ORD-4419 shows status: cancelled + fulfilled⚠ INJECTED: api_timeoutBilling API: 408 Request Timeout (3200ms)⚠ INJECTED: ambiguous_fieldRefund dropdown: "Partial" vs "Prorated" — no tooltip
III

Adversarial conditions

Injected failure modes, timeouts, ambiguous form states, and conflicting data. Your agent trains on the conditions that actually break production deployments.

theta benchmark — zendesk-cs-v2Benchmark Resultssuite: zendesk-cs-v2 · 84 tasksTASKSCORESTATUSTIMErefund_standard0.94PASS12.3srefund_partial0.87PASS18.7sescalation_billing0.41FAIL45.1sauth_reset0.91PASS8.9smulti_channel0.78PASS22.4s78/84 passed6 failedavg score:0.82Completed in 14m 22s · Report saved to /results/run-0047.json
IV

Benchmark as a service

We build the benchmark environment and run evaluations against it. Task suites scoped to your agent's domain.

The environment fidelity problem

Most production failures begin in the training environment.

training environment (mock)ZendeskSupportno edge casesno failure modesno API timeoutsSIM→REALfidelity gapproduction (real)ZendeskSupport#TKT-8821 — Billing DisputePartial refund, cancelled + fulfilledHIGH⚠ Order #ORD-4419status: cancelled AND fulfilled⚠ Billing API timeout408 — 3200ms (retry failed)⚠ Ambiguous form field"Partial" vs "Prorated" refund⚠ Session state driftAuth token expired mid-flowCustomer waiting 47min...Resolve

In most AI training settings, the environment is trivial to define. A coding agent has a compiler. A math model has a verifier. But enterprise AI agents operate in software that was never designed for training.

Interfaces change. Data is inconsistent. Workflows break in ways that are hard to predict and harder to reproduce. Edge cases don't appear in demos. They appear in production.

Most teams treat the training environment as infrastructure work. They build a rough approximation and move on. The agent learns from that approximation. Then it meets the real thing.

This is where most production failures begin. Not in the model. In the world the model was trained in.

Use cases.

ZendeskOpen Tickets (12)Sarah K — BillingCharged twice for #ORD-4419Mike R — ReturnsItem damaged on arrivalJenny L — AuthCan't reset password#TKT-8821HIGHbillingI was charged twice for order#ORD-4419. Please help.I can see the duplicate charge.Processing refund of $49.99.Type your reply...Send

Customer service agents

Full replicas of CRM dashboards, ticketing systems, and live conversation flows. Ticket resolution, refund processing, user authentication, escalation handling, multi-channel interactions.

Amazon.comGmailamazon.com/checkout/paymentCheckoutStep 2 of 3 — PaymentCard number4242 4242 4242 4242Expiry12/27CVV***Place Order — $149.99Order SummarySony WH-1000XM5$149.99ShippingFreeTotal$149.99

Browser and computer use agents

Browser environments with full DOM access, form fields, navigation flows, and multi-tab task contexts. Desktop simulations covering file systems, spreadsheets, email clients, and SaaS tools.

Jira/ AGENT-Sprint-14TO DO3AGENT-142Handle multi-channelfeatureAGENT-143SLA timer integrationIN PROGRESSAGENT-139Refund flow — edge casesbugRDONE8AGENT-137Auth reset flowAGENT-135Escalation routingSprint progress: 8/14 tasks · 57%

Enterprise workflow agents

Replicas of the tools enterprise agents run in. Ticket creation, pipeline updates, document editing, message drafting, and multi-step workflows across more than one application.

Blog(6)

Most agent failures are training failures.

We take on a limited number of teams each quarter because the work requires depth. The first call is 30 minutes.