Skip to content

[Proposal] Allow overriding config.n_ctx at model initialization #1006

@tuomaso

Description

@tuomaso

Proposal

Add an optional argument to override cfg.n_ctx when initializing the model to make it possible override the default n_ctx value provided in from_pretrained.py. This would allow using transformer_lens to study larger context length outputs, as many modern models are trained with much larger context length than the default provided in cfg, e.g. Llama-3.2 models have context length of 128k but currently TransformerLens limits usage to 2048.

Motivation

This would enable studying longer context length interactions using TransformerLens as they are increasingly important in modern LLMs. While I understand why default context lengths are short to save memory, I don't think there currently is a way to load a model using a larger context window than one provided in cfg without directly modifying the source code. n_ctx has to speficied during model init, as changing it after init causes issues due to positional embedding initializations. See the below issues for discussion:

#842 #961

A change similar to this was previously proposed by #491 but this PR seems to be abandoned.

#491

Pitch

Desired interface:
transformer_lens.HookedTransformer.from_pretrained("meta-llama/Llama-3.2-3B", n_ctx=32768)

This should override the n_ctx specified in cfg if specified, otherwise use the n_ctx from config. Optionally add a check to see whether specified n_ctx is larger than what the model was trained on.

Alternatives

I can't think of any alternative ways to enable loading models with larger context window. Currently I have to use a local copy of the TransformerLens and edit the cfg source code directly to run these models with larger context length.

Checklist

  • I have checked that there is no similar issue in the repo (required)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions