Program
Research
We study methods and evaluation for robust, verifiable agentic systems. Our emphasis is on faithful retrieval, answerability, and long-horizon behavior.
Focus areas
- NuanceBench: A benchmark emphasizing edge cases (soft policy boundaries, adversarial requests, and mis-specification) to evaluate real-world agent robustness. (Overview forthcoming.)
- CivSim: Long-horizon, multi-agent civic simulation exploring emergent institutions, constraints, and interventions.
- Retrieval & verification: Hybrid retrieval, structured evidence, and abstention criteria for enterprise use.