Publications — Adventure — Christopher Z. Cui

The Stack of Preprints

Sitting on the desk is a precarious tower of papers, each held together with a single binder clip. You riff through them in chronological order, newest first.

Behavior Cue Reasoning: Monitorable Reasoning Improves Efficiency and Safety through Oversight

May 2026 · Pre-print

Christopher Z. Cui, Taylor W. Killian, Prithviraj Ammanabrolu

We train LLMs to anchor atomic reasoning behaviors — *continue*, *stop*, *answer* — to special token sequences called "behavior cues," making chain-of-thought reasoning monitorable and externally controllable. The method cuts wasted reasoning tokens by up to 50% under an efficiency objective and more than doubles success rate under a safety-constrained objective by recovering safe actions from otherwise-unsafe reasoning traces 80% of the time. Evaluated on Qwen3-8B and GLM-Z1-9B across AIME, TextWorld, and HazardWorld.
TALES: Text Adventure Learning Environment Suite

Oct 2025 · REALM 2025 (Oral, non-archival)

Christopher Z. Cui, Xingdi Yuan, Ziang Xiao, Prithviraj Ammanabrolu, Marc-Alexandre Côté

A progressive stress test of LLMs' baseline agentic capabilities — 122 tasks, 49 models benchmarked. Reveals systematic gaps in current LLMs' ability to act as agents in interactive text environments. Want to see if you can beat an LLM at Zork1? Try it on the TALES page (Environments tab).
Answer Watermarking: Using Answer Generation Assistance Tools to Find Evidence of Cheating

Jul 2024 · Proceedings of the Tenth ACM Conference on Learning@Scale, 2024

Christopher Cui*, Jui-Tse Hung*, Pranav Sharma, Saurabh Chatterjee, Thad Starner (*shared first author)

Catches copy-paste cheaters by hiding invisible byte-level watermarks in generated/displayed answers — when those characters appear in student submissions, you have direct evidence the answer was lifted from an unauthorized source.
Socratic Mind: Scalable Oral Assessment Powered By AI

Jul 2024 · Proceedings of the Tenth ACM Conference on Learning@Scale, 2024

Jui-Tse Hung, Christopher Cui, Diana M. Popescu, Saurabh Chatterjee, Thad Starner

Building an AI to evaluate student understanding through interactive oral interviews — making oral assessment scalable at the course level. Winner of the 2023–24 Tools Competition in Educational Technologies. Try a demo at [socraticmind.com](https://socraticmind.com).
Examinator v4.0: Cheating Detection in Online Take-Home Exams

Jul 2024 · Proceedings of the Tenth ACM Conference on Learning@Scale, 2024

Christopher Cui*, Jui-Tse Hung*, Vaibhav Malhotra, Hardik Goel, Raghav Apoorv, Thad Starner (*equal contribution)

An extension of Examinator v3 using answer-content timestamps and decision-tree classifiers to flag likely cheaters in online take-home exams.
Leveraging Past Assignments to Determine If Students Are Using ChatGPT for Their Essays

Jul 2024 · Proceedings of the Tenth ACM Conference on Learning@Scale, 2024

Yuhui Zhao, Chunhao Zao, Rohit Sridhar, Christopher Cui, Thad Starner

Traditional ML methods can differentiate between human and LLM-generated answers for specific questions given a ground-truth corpus of both. We train classifiers on past student submissions to provide a privacy-respecting baseline for detecting AI-assisted essay writing in classrooms.
A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds

May 2024 · Pre-print

Christopher Cui, Xiangyu Peng, Mark Riedl

Quickly trains role-playing RL agents in open-world text games by combining RL agents that were pre-trained to play other roles, via an attention-based mixture of experts. Demonstrates rapid few-shot transfer of behaviors across characters and tasks.
Thespian: Multi-Character Text Role-Playing Game Agents

Oct 2023 · AIIDE Workshop on Experimental AI in Games (EXAG), 2023

Christopher Cui, Xiangyu Peng, Mark O. Riedl

A "thespian agent" framework that can emulate multiple characters in text-adventure environments, using a soft prompt for direction and an attention mechanism for few-shot learning. Surpasses prior state-of-the-art in multi-character and few-shot learning.
Examinator v3.0: Cheating Detection in Online Take-Home Exams

Jul 2023 · Proceedings of the Tenth ACM Conference on Learning@Scale, 2023

Jui-Tse Hung, Christopher Cui, Varun Agarwal, Saurabh Chatterjee, Raghav Apoorv, Rocko Graziano, Thad Starner

Detects cheating in online take-home exams by comparing answers and the timestamps they were entered. A web interface enables efficient manual inspection. Analyzed 915,831 pairs of exam submissions across three courses over two semesters at a top U.S. institution, identifying 46 instances of cheating.
Neural Story Planning

Feb 2023 · AAAI Workshop on Creative AI Across Modalities, 2023

Anbang Ye, Christopher Cui, Taiwei Shi, Mark O. Riedl

A method for story plot generation that combines causal planning with neural language models. By using commonsense knowledge to recursively expand story plots in a backward-chaining fashion, the system improves narrative coherence over strong baselines.
Story Shaping: Teaching Agents Human-like Behavior with Stories

Jan 2023 · AAAI Conference on AI in Interactive Digital Entertainment (AIIDE), 2023

Xiangyu Peng*, Christopher Cui*, Wei Zhou, Renee Jia, Mark O. Riedl (*equal contribution)

Reward design for RL agents is hard when *how* a goal is achieved matters — common-sense behavior, specific preferences. Story Shaping helps agents infer and align their actions with tacit knowledge from exemplar stories, using knowledge graphs to generate intrinsic rewards based on similarity between agent actions and the story world.

Obvious exits

look (back to the office)about — the placard publications — the preprint stack talks — the slide deck teaching — the seminar room cv — the cabinet of dossiers