QWEN7B to LLAMA GPTQ model structure| 东毅居士

QWEN7B to LLAMA GPTQ model structure

作者：XD / 发表： 2023年11月13日 21:32 / 更新： 2023年11月13日 21:32 / 科研学习 / 阅读量：1768

Here is the markdown format for the GPTQ model structure, detailing each layer and component:

GPTQ Model Structure

The GPTQ model consists of the following layers and components:

Embedding Layer

model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

input_layernorm.weight: torch.Size([4096])
Self-Attention Sublayer:
- k_proj:
  - qweight: torch.Size([512, 4096])
  - qzeros: torch.Size([32, 512])
  - scales: torch.Size([32, 4096])
  - g_idx: torch.Size([4096])
  - bias: torch.Size([4096])
- o_proj:
  - qweight: torch.Size([512, 4096])
  - qzeros: torch.Size([32, 512])
  - scales: torch.Size([32, 4096])
  - g_idx: torch.Size([4096])
  - bias: torch.Size([4096])
- q_proj:
  - qweight: torch.Size([512, 4096])
  - qzeros: torch.Size([32, 512])
  - scales: torch.Size([32, 4096])
  - g_idx: torch.Size([4096])
  - bias: torch.Size([4096])
- v_proj:
  - qweight: torch.Size([512, 4096])
  - qzeros: torch.Size([32, 512])
  - scales: torch.Size([32, 4096])
  - g_idx: torch.Size([4096])
  - bias: torch.Size([4096])
MLP (Multi-Layer Perceptron) Sublayer:
- down_proj:
  - qweight: torch.Size([1376, 4096])
  - qzeros: torch.Size([86, 512])
  - scales: torch.Size([86, 4096])
  - g_idx: torch.Size([11008])
  - bias: torch.Size([4096])
- gate_proj:
  - qweight: torch.Size([512, 11008])
  - qzeros: torch.Size([32, 1376])
  - scales: torch.Size([32, 11008])
  - g_idx: torch.Size([4096])
  - bias: torch.Size([11008])
- up_proj:
  - qweight: torch.Size([512, 11008])
  - qzeros: torch.Size([32, 1376])
  - scales: torch.Size([32, 11008])
  - g_idx: torch.Size([4096])
  - bias: torch.Size([11008])
post_attention_layernorm.weight: torch.Size([4096])