About

I'm a 3rd-year PhD student at UC San Diego, advised by Dr. Prithviraj Ammanabrolu in the PEARLS Lab, united in our love for boba tea and our interest in embodied RL agents and interactive, human-aligned AI systems.

In general, I'm interested in all things Natural Language Processing, Reinforcement Learning, and systems that collaborate with people to improve the human experience as a whole.

Most of my current work centers on the reasoning capabilities of LLMs, especially in multi-turn agentic settings, where the standard paradigm of "burn tokens until something happens" can actually hurt performance. I think about how to make reasoning monitorable and controllable from the outside, how to train agents that can recover from their own bad reasoning, and how to stress-test what current LLMs can and can't do as agents in text environments.

This summer (June–August 2026), I'll be interning at Microsoft Research Montréal with Marc-Alexandre Côté and Xingdi "Eric" Yuan, continuing our collaboration on LLM agents and situated text environments.

Prior to UCSD, I did my Master's at the Georgia Institute of Technology under the guidance of Dr. Thad Starner and Dr. Mark Riedl. I also had the privilege of serving as a Head Graduate Teaching Assistant for Dr. Thomas Ploetz — although our research interests didn't end up aligning, he was a mentor and friend whose influence is a big part of why I chose to pursue a PhD in the first place.

Research Highlights

Behavior Cue Reasoning: Monitorable Reasoning Improves Efficiency and Safety through Oversight

Christopher Z. Cui, Taylor W. Killian, Prithviraj Ammanabrolu · Pre-print

We train LLMs to anchor atomic reasoning behaviors — *continue*, *stop*, *answer* — to special token sequences called "behavior cues," making chain-of-thought reasoning monitorable and externally controllable. The method cuts wasted reasoning tokens by up to 50% under an efficiency objective and more than doubles success rate under a safety-constrained objective by recovering safe actions from otherwise-unsafe reasoning traces 80% of the time. Evaluated on Qwen3-8B and GLM-Z1-9B across AIME, TextWorld, and HazardWorld.

TALES: Text Adventure Learning Environment Suite

Christopher Z. Cui, Xingdi Yuan, Ziang Xiao, Prithviraj Ammanabrolu, Marc-Alexandre Côté · REALM 2025 (Oral, non-archival)

A progressive stress test of LLMs' baseline agentic capabilities — 122 tasks, 49 models benchmarked. Reveals systematic gaps in current LLMs' ability to act as agents in interactive text environments. Want to see if you can beat an LLM at Zork1? Try it on the TALES page (Environments tab).

Socratic Mind — Research Lead

Tools Competition 2023–24 winner

Using foundation models to conduct scalable, interactive oral assessments. The tool has since garnered interest from 15+ professors across 7 universities and served over 1,000 students. Try a demo at socraticmind.com.

Check out the other cool stuff my labmates are working on!