Steering Code LLMs with Activation Directions for Language and Library Control

Md Mahbubur Rahman, Arjun Guha, and Harshitha Menon, 2026

Code LLMs often default to particular programming languages and libraries when prompts do not specify an ecosystem. We study whether these preferences are represented by approximately linear activation directions that can be edited at generation time.

Using a difference-in-means method, we estimate layer-wise steering vectors for language and library pairs and add them to model hidden states. Across three open-weight code models, these interventions substantially increase generation toward the target ecosystem under neutral prompts, and often remain influential even when prompts request the opposite choice. The results suggest that code style preferences are partly represented by compact, steerable structure in activation space.

PDF available on arXiv

@misc{rahman:steering-code-llms,
title = {Steering Code {LLMs} with Activation Directions for Language and Library Control},
author = {Md Mahbubur Rahman and Arjun Guha and Harshitha Menon},
year = {2026},

}