The Stack of Preprints
-
We train LLMs to anchor atomic reasoning behaviors — *continue*, *stop*, *answer* — to special token sequences called "behavior cues," making chain-of-thought reasoning monitorable and externally controllable. The method cuts wasted reasoning tokens by up to 50% under an efficiency objective and more than doubles success rate under a safety-constrained objective by recovering safe actions from otherwise-unsafe reasoning traces 80% of the time. Evaluated on Qwen3-8B and GLM-Z1-9B across AIME, TextWorld, and HazardWorld.
-
A progressive stress test of LLMs' baseline agentic capabilities — 122 tasks, 49 models benchmarked. Reveals systematic gaps in current LLMs' ability to act as agents in interactive text environments. Want to see if you can beat an LLM at Zork1? Try it on the TALES page (Environments tab).
-
Catches copy-paste cheaters by hiding invisible byte-level watermarks in generated/displayed answers — when those characters appear in student submissions, you have direct evidence the answer was lifted from an unauthorized source.
-
Building an AI to evaluate student understanding through interactive oral interviews — making oral assessment scalable at the course level. Winner of the 2023–24 Tools Competition in Educational Technologies. Try a demo at [socraticmind.com](https://socraticmind.com).
-
An extension of Examinator v3 using answer-content timestamps and decision-tree classifiers to flag likely cheaters in online take-home exams.
-
Traditional ML methods can differentiate between human and LLM-generated answers for specific questions given a ground-truth corpus of both. We train classifiers on past student submissions to provide a privacy-respecting baseline for detecting AI-assisted essay writing in classrooms.
-
Quickly trains role-playing RL agents in open-world text games by combining RL agents that were pre-trained to play other roles, via an attention-based mixture of experts. Demonstrates rapid few-shot transfer of behaviors across characters and tasks.
-
A "thespian agent" framework that can emulate multiple characters in text-adventure environments, using a soft prompt for direction and an attention mechanism for few-shot learning. Surpasses prior state-of-the-art in multi-character and few-shot learning.
-
Detects cheating in online take-home exams by comparing answers and the timestamps they were entered. A web interface enables efficient manual inspection. Analyzed 915,831 pairs of exam submissions across three courses over two semesters at a top U.S. institution, identifying 46 instances of cheating.
-
A method for story plot generation that combines causal planning with neural language models. By using commonsense knowledge to recursively expand story plots in a backward-chaining fashion, the system improves narrative coherence over strong baselines.
-
Reward design for RL agents is hard when *how* a goal is achieved matters — common-sense behavior, specific preferences. Story Shaping helps agents infer and align their actions with tacit knowledge from exemplar stories, using knowledge graphs to generate intrinsic rewards based on similarity between agent actions and the story world.