Aashiq Muhamed

Ph.D. Student in Computer Science, Carnegie Mellon University

aashiq.jpg

I am a Ph.D. student in the CMU Machine Learning Department, advised by Professors Mona Diab and Virginia Smith. I also closely collaborate with Professor Andrew Ilyas.

My research develops the science and safety infrastructure for autonomous agentic systems. I work on both the foundational interpretability methods needed to understand how models reason internally, and the applied tools to monitor and govern agents in deployment. I’m particularly interested in:

  • Foundations of Mechanistic Interpretability: Developing SAEs, circuit analysis, and interpretability methods to understand model internals—including feature consistency, rare concept detection, unlearning, and the geometry of learned representations.
  • Interpretability of Reasoning Models: Understanding how reasoning capabilities emerge and can be intervened on during training across paradigms (SFT, RLVR, and beyond), and what drives robustness in finetuned reasoning models.
  • Agent Safety Monitoring and Evaluation: What makes an effective safety monitor? I study chain-of-thought monitoring, mechanistic monitors, and benchmarking their robustness against direct misuse and evasion.
  • Multi-Agent Collusion and Coordination: Using mechanistic tools to detect and understand collusion, deception, and covert coordination in multi-agent systems.
  • Tamper Resistance and Open-Weight Safety: Making safety properties robust in open-weight models where adversaries have full access to modify weights.

I am supported by the Amazon PhD Fellowship, Anthropic Fellows Program, the Siebel Scholars Foundation, the Cooperative AI PhD Fellowship, Longview Philanthropy, CAIS, SPAR, and MATS.

background

I bring an interdisciplinary perspective, with degrees spanning engineering, language technologies, and machine learning. I earned my B.Tech in Mechanical Engineering from the Indian Institute of Technology, Roorkee, where I was awarded the President’s Gold Medal. I subsequently completed an MS in Mechanical Engineering at Stanford University and an MS in Language Technologies from the CMU Language Technologies Institute.

Before beginning my Ph.D., I spent five years in industry as an Applied Scientist at Amazon, working across diverse AI applications including AWS DeepComposer (2019-2021), Amazon Search M5 (2021-2022), and AWS AI (2022-2023). This industry experience deeply informs my research, emphasizing systems that are both theoretically grounded and practically deployable.

I’m always open to collaboration — feel free to reach out.

news

Feb 6, 2026 Honored to be selected as a Cooperative AI PhD Fellow for 2026, ranked among the top 5% of applicants!
Jan 9, 2026 Excited to be admitted to the 2026 cohort of the International Programme on AI Evaluation: Capabilities and Safety, selected among the top 2% of applicants globally! I’ll be joining a diverse group working on rigorous methods for evaluating AI capabilities and safety.
Nov 30, 2025 Excited to be selected as an Anthropic Fellow for AI safety research starting January 2025!
Oct 21, 2025 Delighted to be named an Amazon AI PhD Fellow! I’ll be working on monitoring and post-training for AI agents.
Aug 1, 2025 I’m excited to be a mentor for SPAR 2025 this Fall! I’ll be mentoring students on using SAEs for interpretable and tamper-resistant alignment. If you’re interested in advancing AI safety through hands-on interpretability research, please apply!
Jul 1, 2025 I am a visiting researcher at University of California, Berkeley this summer, hosted by Prof. Dawn Song and Xuandong Zhao.
Jun 1, 2025 Working as a FIG Fellow with Chi Nguyen, Caspar Oesterheld, and Emery Cooper on “Training AIs to Aid Decision Theory and Acausal Research” through the Future Impact Group’s Philosophy for Safe AI program.
May 2, 2025 I’m excited to present two papers at NAACL 2025: “CoRAG: Collaborative Retrieval-Augmented Generation” and “Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models”, with the latter also featuring an oral presentation at the TrustNLP Workshop.
Oct 16, 2024 Our work Inducing Elasticity in Foundation Models: Post-Training Techniques for Adaptable Inference was accepted at the The 4th Workshop on Efficient Natural Language and Speech Processing @ NeurIPS 2024. We study weight decomposition approaches to induce elasticity in pretrained LLMs.
Oct 15, 2024 I will be presenting my MATS project, “Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in LLMs,” at the Second NeurIPS Workshop on Attributing Model Behavior at Scale. Our results show that SSAEs pareto dominate pretrained SAEs within specific subdomains, with promising implications for broader applications in AI safety.
Sep 1, 2024 Delighted to have been named a Siebel Scholar 2025.
Aug 1, 2024 Spending the summer at Berkeley as an ML alignment and Theory Scholar working with Lucius Bushnaq and Jake Mendel from Apollo Research. Excited about pushing the frontiers of mechanistic interpretability.
Jul 20, 2024 We will present our work GRASS: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients at EMNLP 2024 and the Efficient Systems for Foundation Models Workshop @ICML 2024.
Mar 5, 2024 We will present our work “Fed Up with Complexity: Simplifying Many-Task Federated Learning with NTKFedAvg”, and “Cache Me If You Can: The Case For Retrieval Augmentation in Federated Learning” at the Privacy Regulation and Protection in Machine Learning Workshop @ ICLR 2024.
Mar 1, 2024 Our work “Less is Fed More: Sparsity Reduces Feature Distortion in Federated Learning” studying sparsity and feature distortion in federated learning was accepted at the Modular and Multilingual NLP Workshop at EACL 2024.
Feb 27, 2024 Our work “Adversarial Continuous Text to Image Generation” has been accepted to CVPR24!
Dec 20, 2023 We released An In-depth Look at Gemini’s Language Abilities, an impartial, in-depth, and reproducible study comparing Gemini, GPT, and Mixtral.
Dec 20, 2023 Our solution was the winning entry at the 1st Privacy Preserving Federated Learning Document VQA competition at NeurIPS 2023.
Sep 1, 2023 Started my MS/Ph.D at CMU!

selected publications

  1. EACLNeurIPS LLM Eval
    Aashiq Muhamed, Leonardo F. R. Ribeiro, Markus Dreyer, and 2 more authors
    In EACL 2026, 2026
  2. NeurIPS MechInterp Spotlight
    Xiangchen Song*, Aashiq Muhamed*, Yujia Zheng, and 5 more authors
    In Mechanistic Interpretability Workshop at NeurIPS, 2025
  3. NeurIPS MechInterp Spotlight
    Towards a Mechanistic Understanding of Robustness in Finetuned Reasoning Models
    Aashiq Muhamed, Xuandong Zhao, Mona T. Diab, and 2 more authors
    In Mechanistic Interpretability Workshop at NeurIPS, 2025
  4. COLMICML Actionable InterpICML R2FM
    Aashiq Muhamed, Jacopo Bonato, Mona Diab, and 1 more author
    In Conference on Language Modeling (COLM), 2025