QWEN7B to LLAMA7B Model Structure
作者:XD / 发表: 2023年11月13日 21:00 / 更新: 2023年11月13日 21:06 / 科研学习 / 阅读量:1712
Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:
LLAMA7B Model Structure
The LLAMA7B model consists of the following layers and components:
Embedding Layer
model.embed_tokens.weight:torch.Size([151851, 4096])
Layers
Each layer in the model has the following components:
Layer 0 to Layer 31
Each layer (model.layers.[0-31]) includes:
-
input_layernorm.weight:torch.Size([4096]) -
Self-Attention Sublayer:
-
q_proj.weight:torch.Size([4096, 4096]) -
k_proj.weight:torch.Size([4096, 4096]) -
v_proj.weight:torch.Size([4096, 4096]) -
q_proj.bias:torch.Size([4096]) -
k_proj.bias:torch.Size([4096]) -
v_proj.bias:torch.Size([4096]) -
o_proj.weight:torch.Size([4096, 4096]) -
post_attention_layernorm.weight:torch.Size([4096])
-
-
MLP (Multi-Layer Perceptron) Sublayer:
-
up_proj.weight:torch.Size([11008, 4096]) -
gate_proj.weight:torch.Size([11008, 4096]) -
down_proj.weight:torch.Size([4096, 11008])
-
Final Layer Normalization and Output
model.norm.weight:torch.Size([4096])lm_head.weight:torch.Size([151851, 4096])
