Ken Gu
I work on the evaluation and development of language models and agents, with a focus on scientific and technical domains. I care about building benchmarks that measure reasoning, and agents that can plan and reliably carry out open-ended tasks.
I am currently a Member of Technical Staff at xAI working on language model reasoning, and a PhD student in Computer Science at the University of Washington (on leave), advised by Tim Althoff. I have also conducted research at Google,
Salesforce, and
Microsoft Research.
Selected Publications
2026
2025
2024
Updates
| Jan 2026 | Two papers from my PhD are accepted to ICLR 🇧🇷 and CHI 🇪🇸! |
|---|---|
| Nov 2025 | |
| Sep 2025 | Update from my 1 year at Google 🚀 I led two exciting projects: RADAR, a dataset advancing Gemini’s tabular and data science reasoning, recently accepted at NeurIPS, and our 145-page Personal Health Agent paper, introducing a multi-agent framework that integrates data from wearables and personal health records to drive personalized health insights. |
| Oct 2024 | 🎙️ Gave an invited talk on BLADE at AI2. I enjoyed the insightful discussions that followed, especially on how we can approach evaluation for data-driven science and open-ended tasks. |
| Sep 2024 | 🍂 Thrilled to start an internship at |
| Jan 2024 | 🧙 Excited to share that two papers on understanding human-AI collaboration in data science have been accepted to CHI 2024!! One stems from my internship with Microsoft Research last summer, and the other is a Wizard-of-Oz study conducted with collaborators at UW, where we acted as LLM data analysis assistants. |
| Jun 2023 | 🏔 Started my internship at |
| Apr 2023 | 🇩🇪 Attended CHI 2023 in Hamburg, Germany! This was my first in-person conference and my first time in Europe! |