Ken Gu


ken-gu.jpeg


I work on the evaluation and development of language models and agents, with a focus on scientific and technical domains. I care about building benchmarks that measure reasoning, and agents that can plan and reliably carry out open-ended tasks.

I am currently a Member of Technical Staff at xAI working on language model reasoning, and a PhD student in Computer Science at the University of Washington (on leave), advised by Tim Althoff. I have also conducted research at Google, Salesforce, and Microsoft Research.


Selected Publications

2026

  1. synthworlds-teaser.png
    SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models
    Ken Gu, Advait Bhat, Mike A. Merrill, Robert West, Xin Liu, and 2 more authors
    ICLR 2026
  2. collaborative-scaling.png
    Completion ≠Collaboration: Scaling Collaborative Effort with Agents
    Shannon Zejiang Shen*, Valerie Chen*,  Ken Gu, Alexis Ross, Zixian Ma, and 9 more authors
    ACL Findings 2026
    🏅 NeurIPS Workshop on Responsible Foundation Models 2025 Best Paper
  3. scaling-agent-systems.png
    Towards a Science of Scaling Agent Systems
    Yubin Kim,  Ken Gu, Chanwoo Park, Chunjong Park, Samuel Schmidgall, and 14 more authors
    Preprint 2026

2025

  1. radar-teaser.png
    RADAR: Benchmarking Language Models on Imperfect Tabular Data
    Ken Gu, Zhihan Zhang, Kate Lin, Yuwei Zhang, Akshay Paruchuri, and 16 more authors
    NeurIPS 2025
    Integrated in Gemini

2024

  1. blade.png
    BLADE: Benchmarking Language Model Agents for Data-Driven Science
    Ken Gu, Ruoxi Shang, Ruien Jiang, Keying Kuang, Richard-John Lin, and 11 more authors
    EMNLP 2024

Updates

Jan 2026 Two papers from my PhD are accepted to ICLR 🇧🇷 and CHI 🇪🇸!
Nov 2025 I joined xAI as a Member of Technical Staff.
Sep 2025 Update from my 1 year at Google 🚀 I led two exciting projects: RADAR, a dataset advancing Gemini’s tabular and data science reasoning, recently accepted at NeurIPS, and our 145-page Personal Health Agent paper, introducing a multi-agent framework that integrates data from wearables and personal health records to drive personalized health insights.
Oct 2024 🎙️ Gave an invited talk on BLADE at AI2. I enjoyed the insightful discussions that followed, especially on how we can approach evaluation for data-driven science and open-ended tasks.
Sep 2024 🍂 Thrilled to start an internship at Google Research, focusing on agents for personal health data and building upon insights from our BLADE benchmark!
Jan 2024 🧙 Excited to share that two papers on understanding human-AI collaboration in data science have been accepted to CHI 2024!! One stems from my internship with Microsoft Research last summer, and the other is a Wizard-of-Oz study conducted with collaborators at UW, where we acted as LLM data analysis assistants.
Jun 2023 🏔 Started my internship at Microsoft Research with Chenglong Wang and Steven Drucker!
Apr 2023 🇩🇪 Attended CHI 2023 in Hamburg, Germany! This was my first in-person conference and my first time in Europe!