Proposal
Add an optional argument to override cfg.n_ctx when initializing the model to make it possible override the default n_ctx value provided in from_pretrained.py. This would allow using transformer_lens to study larger context length outputs, as many modern models are trained with much larger context length than the default provided in cfg, e.g. Llama-3.2 models have context length of 128k but currently TransformerLens limits usage to 2048.
Motivation
This would enable studying longer context length interactions using TransformerLens as they are increasingly important in modern LLMs. While I understand why default context lengths are short to save memory, I don't think there currently is a way to load a model using a larger context window than one provided in cfg without directly modifying the source code. n_ctx has to speficied during model init, as changing it after init causes issues due to positional embedding initializations. See the below issues for discussion:
#842 #961
A change similar to this was previously proposed by #491 but this PR seems to be abandoned.
#491
Pitch
Desired interface:
transformer_lens.HookedTransformer.from_pretrained("meta-llama/Llama-3.2-3B", n_ctx=32768)
This should override the n_ctx specified in cfg if specified, otherwise use the n_ctx from config. Optionally add a check to see whether specified n_ctx is larger than what the model was trained on.
Alternatives
I can't think of any alternative ways to enable loading models with larger context window. Currently I have to use a local copy of the TransformerLens and edit the cfg source code directly to run these models with larger context length.
Checklist
Proposal
Add an optional argument to override cfg.n_ctx when initializing the model to make it possible override the default n_ctx value provided in from_pretrained.py. This would allow using transformer_lens to study larger context length outputs, as many modern models are trained with much larger context length than the default provided in cfg, e.g. Llama-3.2 models have context length of 128k but currently TransformerLens limits usage to 2048.
Motivation
This would enable studying longer context length interactions using TransformerLens as they are increasingly important in modern LLMs. While I understand why default context lengths are short to save memory, I don't think there currently is a way to load a model using a larger context window than one provided in cfg without directly modifying the source code. n_ctx has to speficied during model init, as changing it after init causes issues due to positional embedding initializations. See the below issues for discussion:
#842 #961
A change similar to this was previously proposed by #491 but this PR seems to be abandoned.
#491
Pitch
Desired interface:
transformer_lens.HookedTransformer.from_pretrained("meta-llama/Llama-3.2-3B", n_ctx=32768)This should override the n_ctx specified in cfg if specified, otherwise use the n_ctx from config. Optionally add a check to see whether specified n_ctx is larger than what the model was trained on.
Alternatives
I can't think of any alternative ways to enable loading models with larger context window. Currently I have to use a local copy of the TransformerLens and edit the cfg source code directly to run these models with larger context length.
Checklist