EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Use Statistics SVR 顶会 OpenCV WebCrawler Hotel Excel CAM Plotly Tensor 签证 CV Markdown Jetson YOLO Claude Baidu Search Knowledge v2ray Llama Nginx Domain scipy CTC Disk Safetensors LLM Attention 报税 Django Template OCR 递归学习法 Plate Quantization Linux Conda Ptyhon git Password JSON 证件照 NameSilo 多线程 Qwen Math Firewall FastAPI PIP C++ Random ResNet-50 Qwen2 Input FP16 Algorithm 腾讯云 QWEN 多进程 XGBoost 财报 Paddle SQLite 阿里云 InvalidArgumentError Bin Web Video Docker Bert Miniforge printf 关于博主 Bitcoin UNIX SAM XML Card Color 第一性原理 版权 torchinfo Breakpoint Clash 继承 HuggingFace 论文 Distillation Translation RGB GGML 音频 GPT4 Data Google PyCharm PDB Food NLP SQL diffusers EXCEL 图形思考法 Ubuntu 算法题 TTS Magnet Rebuttal hf Shortcut FP8 VPN Review TSV Diagram FlashAttention Tracking API BF16 Animate Anaconda LoRA Zip Tiktoken Vim UI Pandas v0.dev Vmess AI Bipartite mmap 净利润 Michelin HaggingFace Heatmap Logo GIT FP64 llama.cpp CUDA 强化学习 Git CEIR Python CSV Qwen2.5 Image2Text Pickle LeetCode News BeautifulSoup Agent CC transformers tqdm Proxy Permission RAR DeepStream Base64 Land 云服务器 Windows Pillow Dataset Paper LLAMA Sklearn BTC 论文速读 Crawler uWSGI CLAP Interview PDF uwsgi WAN MD5 Mixtral GPTQ Gemma 图标 Datetime COCO IndexTTS2 VGG-16 Streamlit Pytorch Github 飞书 Hilton ONNX TensorFlow 公式 搞笑 Jupyter PyTorch git-lfs OpenAI FP32 NLTK Quantize ChatGPT GoogLeNet Hungarian TensorRT Transformers SPIE Freesound VSCode Cloudreve DeepSeek Augmentation Website ModelScope logger icon 域名 Numpy tar LaTeX
站点统计

本站现有博文327篇,共被浏览833152

本站已经建立2538天!

热门文章
文章归档
回到顶部