QWEN7B to LLAMA7B Model Structure| 东毅居士

QWEN7B to LLAMA7B Model Structure

作者：XD / 发表： 2023年11月13日 21:00 / 更新： 2023年11月13日 21:06 / 科研学习 / 阅读量：1712

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:

LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

input_layernorm.weight: torch.Size([4096])
Self-Attention Sublayer:
- q_proj.weight: torch.Size([4096, 4096])
- k_proj.weight: torch.Size([4096, 4096])
- v_proj.weight: torch.Size([4096, 4096])
- q_proj.bias: torch.Size([4096])
- k_proj.bias: torch.Size([4096])
- v_proj.bias: torch.Size([4096])
- o_proj.weight: torch.Size([4096, 4096])
- post_attention_layernorm.weight: torch.Size([4096])
MLP (Multi-Layer Perceptron) Sublayer:
- up_proj.weight: torch.Size([11008, 4096])
- gate_proj.weight: torch.Size([11008, 4096])
- down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

model.norm.weight: torch.Size([4096])
lm_head.weight: torch.Size([151851, 4096])

本文作者：XD 转载请标明出处：http://www.eadst.com/blog/216

本站采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。

上一篇
GGML Q4_0 Quantize Analysis in llama.cpp

下一篇
QWEN7B to LLAMA GPTQ model structure

相关标签

QWEN LLAMA

About Me

XD

Goals determine what you are going to be.

Category

标签云

Review Transformers Bipartite LLAMA Paper BeautifulSoup Git Diagram HaggingFace tqdm NLP tar Gemma FP16 Bitcoin Ptyhon GPT4 Data git-lfs HuggingFace ONNX Bin Linux Plotly Translation InvalidArgumentError VPN hf Algorithm LoRA FP8 Hotel Disk API Streamlit SVR Magnet Video CLAP Attention YOLO Math Markdown NLTK Pytorch Safetensors mmap 版权 Mixtral Freesound Augmentation EXCEL 关于博主 GIT Hilton Breakpoint Animate scipy DeepSeek Template Dataset Ubuntu Shortcut MD5 UNIX Github GPTQ Zip XML 算法题 Vmess UI Pillow transformers CUDA Conda Logo BTC ModelScope Bert FP64 Distillation OpenAI Cloudreve 阿里云公式 QWEN Sklearn CTC Windows AI VSCode Llama PyTorch GGML Python Baidu Domain Jupyter Web FlashAttention SPIE Use printf BF16 Card Interview Claude 飞书 Crawler FastAPI DeepStream 腾讯云音频 Proxy Clash CC Website diffusers Food Datetime Jetson WAN Michelin ChatGPT Nginx 多进程 Tiktoken PDB Pandas Plate Knowledge torchinfo CEIR Color WebCrawler FP32 Firewall git v2ray Qwen2.5 Random Anaconda llama.cpp Qwen Heatmap Land Excel 多线程 Statistics Tensor NameSilo Django CV Image2Text Base64 uwsgi TSV JSON COCO CAM SQLite OpenCV SQL v0.dev 财报 LLM TTS 证件照 uWSGI XGBoost RGB Permission Docker Tracking Quantize 报税 Vim GoogLeNet Pickle Qwen2 Paddle Input LaTeX PIP OCR PDF 域名 TensorFlow PyCharm ResNet-50 Quantization Google 继承 Password C++ 签证 TensorRT Hungarian 净利润 RAR VGG-16 Numpy 搞笑 LeetCode logger CSV

站点统计

本站现有博文305篇,共被浏览715486次

本站已经建立2338天!

热门文章

文章归档