Conversation
Added rnn architecture type
damian1996
left a comment
There was a problem hiding this comment.
Are you sure we are need Architecture class at all? Probably model_name and provider_name can identify the model for us and I'm not sure if we need information about underlying architecture from evaluation logging point of view.
|
I think the way things are setup - model_name is sufficient information for a checkpoint, and there will be edge-cases like Nemotron-H which are hybrid and this class will bloat up without adding other information. I think it might be okay to remove it completely. |
|
In my opinion, this full class is not very helpful. Architecture like Harsha mentioned can be probably removed completely. Number of parameters often is included in model_name directly. If not, there probably this information is not necessary. Context_window/instruct also is dependant on the model directly (instruct often is given in model name as well). |
Added rnn architecture type