QWEN7B to LLAMA GPTQ model structure
作者:XD / 发表: 2023年11月13日 21:32 / 更新: 2023年11月13日 21:32 / 科研学习 / 阅读量:1768
Here is the markdown format for the GPTQ model structure, detailing each layer and component:
GPTQ Model Structure
The GPTQ model consists of the following layers and components:
Embedding Layer
model.embed_tokens.weight:torch.Size([151851, 4096])
Layers
Each layer in the model has the following components:
Layer 0 to Layer 31
Each layer (model.layers.[0-31]) includes:
-
input_layernorm.weight:torch.Size([4096]) -
Self-Attention Sublayer:
-
k_proj:-
qweight:torch.Size([512, 4096]) -
qzeros:torch.Size([32, 512]) -
scales:torch.Size([32, 4096]) -
g_idx:torch.Size([4096]) -
bias:torch.Size([4096])
-
-
o_proj:-
qweight:torch.Size([512, 4096]) -
qzeros:torch.Size([32, 512]) -
scales:torch.Size([32, 4096]) -
g_idx:torch.Size([4096]) -
bias:torch.Size([4096])
-
-
q_proj:-
qweight:torch.Size([512, 4096]) -
qzeros:torch.Size([32, 512]) -
scales:torch.Size([32, 4096]) -
g_idx:torch.Size([4096]) -
bias:torch.Size([4096])
-
-
v_proj:-
qweight:torch.Size([512, 4096]) -
qzeros:torch.Size([32, 512]) -
scales:torch.Size([32, 4096]) -
g_idx:torch.Size([4096]) -
bias:torch.Size([4096])
-
-
-
MLP (Multi-Layer Perceptron) Sublayer:
-
down_proj:-
qweight:torch.Size([1376, 4096]) -
qzeros:torch.Size([86, 512]) -
scales:torch.Size([86, 4096]) -
g_idx:torch.Size([11008]) -
bias:torch.Size([4096])
-
-
gate_proj:-
qweight:torch.Size([512, 11008]) -
qzeros:torch.Size([32, 1376]) -
scales:torch.Size([32, 11008]) -
g_idx:torch.Size([4096]) -
bias:torch.Size([11008])
-
-
up_proj:-
qweight:torch.Size([512, 11008]) -
qzeros:torch.Size([32, 1376]) -
scales:torch.Size([32, 11008]) -
g_idx:torch.Size([4096]) -
bias:torch.Size([11008])
-
-
-
post_attention_layernorm.weight:torch.Size([4096])
Final Layer Normalization and Output
model.norm.weight:torch.Size([4096])lm_head.weight:torch.Size([151851, 4096])
