LLM 模型量化问题为什么不量化 lm_head ？

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

已注册用户请登录

这是一个创建于 595 天前的主题，其中的信息可能已经有所发展或是发生改变。

最近在看模型量化的课。

里面在量化下面这个模型的时候说建议不要量化最后的lm_head。

CodeGenForCausalLM( (transformer): CodeGenModel( (wte): Embedding(51200, 1024) (drop): Dropout(p=0.0, inplace=False) (h): ModuleList( (0-19): 20 x CodeGenBlock( (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (attn): CodeGenAttention( (attn_dropout): Dropout(p=0.0, inplace=False) (resid_dropout): Dropout(p=0.0, inplace=False) (qkv_proj): W8A16LinearLayer() (out_proj): W8A16LinearLayer() ) (mlp): CodeGenMLP( (fc_in): W8A16LinearLayer() (fc_out): W8A16LinearLayer() (act): NewGELUActivation() (dropout): Dropout(p=0.0, inplace=False) ) ) ) (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (lm_head): Linear(in_features=1024, out_features=51200, bias=True) )

他说的原文如下：

 2:14 And as I said we're not going to quantize the language model head 2:18 because since the model is an autoregressive model, it uses 2:22 the output from the previous iteration to get the output of the next iteration. 2:27 If you quantize the language model head, a lot of errors might 2:31 might be accumulating over the generation steps. 2:34 And you will most likely end up, having some gibberish after some tokens.

没看懂他说的理由，为什么量化 lm_head 会积累错误？有大佬能简单易懂的解释一下吗？

课程网页如下： https://learn.deeplearning.ai/courses/quantization-in-depth/lesson/12/quantize-any-open-source-pytorch-model

目前尚无回复

量化 lm_head 错误

LLM 模型量化问题 为什么不量化 lm_head ？

LLM 模型量化问题为什么不量化 lm_head ？