EADST

QWEN7B to LLAMA7B Model Structure

Here is the markdown format for the LLAMA7B model structure, detailing each layer and component:


LLAMA7B Model Structure

The LLAMA7B model consists of the following layers and components:

Embedding Layer

  • model.embed_tokens.weight: torch.Size([151851, 4096])

Layers

Each layer in the model has the following components:

Layer 0 to Layer 31

Each layer (model.layers.[0-31]) includes:

  • input_layernorm.weight: torch.Size([4096])

  • Self-Attention Sublayer:

    • q_proj.weight: torch.Size([4096, 4096])

    • k_proj.weight: torch.Size([4096, 4096])

    • v_proj.weight: torch.Size([4096, 4096])

    • q_proj.bias: torch.Size([4096])

    • k_proj.bias: torch.Size([4096])

    • v_proj.bias: torch.Size([4096])

    • o_proj.weight: torch.Size([4096, 4096])

    • post_attention_layernorm.weight: torch.Size([4096])

  • MLP (Multi-Layer Perceptron) Sublayer:

    • up_proj.weight: torch.Size([11008, 4096])

    • gate_proj.weight: torch.Size([11008, 4096])

    • down_proj.weight: torch.Size([4096, 11008])

Final Layer Normalization and Output

  • model.norm.weight: torch.Size([4096])
  • lm_head.weight: torch.Size([151851, 4096])
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Ubuntu Land 多线程 git-lfs 净利润 Claude BTC Pillow CV GGML Mixtral 搞笑 Math Color Excel Llama InvalidArgumentError LLAMA Domain Pytorch Tensor Miniforge RGB Food CUDA Baidu Shortcut Michelin Jetson DeepStream 图形思考法 BF16 DeepSeek JSON 报税 算法题 Bitcoin NameSilo VGG-16 Random XML MD5 Hungarian transformers Crawler Bin LoRA Disk Cloudreve Translation 多进程 logger PDF scipy ChatGPT 证件照 顶会 Pickle FastAPI 云服务器 Anaconda CC Pandas Firewall Vmess Use Augmentation Qwen2.5 Permission Website C++ Jupyter NLP Paddle XGBoost Qwen2 Breakpoint WAN Card VSCode Video Logo ResNet-50 PDB GoogLeNet 飞书 Distillation 强化学习 LLM Search Transformers v0.dev COCO Web Safetensors Algorithm uWSGI News torchinfo EXCEL llama.cpp Animate v2ray SQL FP32 GPT4 FP8 Paper Git Datetime Numpy Sklearn icon Statistics Template WebCrawler Github CEIR TTS FlashAttention OpenAI 腾讯云 Data Bipartite Bert GPTQ Attention printf Python CTC Windows Tracking Vim Magnet hf Tiktoken OCR ModelScope 版权 UNIX Qwen Google NLTK QWEN Hilton 财报 图标 Heatmap RAR Proxy Password PIP Gemma Quantize SQLite GIT Quantization Freesound Image2Text CLAP 第一性原理 Clash git API YOLO 阿里云 SAM diffusers Review SVR tar Plate FP64 Agent Hotel Django LaTeX tqdm Dataset 关于博主 Diagram TensorRT TensorFlow UI SPIE Linux BeautifulSoup FP16 TSV Knowledge mmap Interview LeetCode Markdown 公式 域名 Rebuttal PyCharm HaggingFace CSV 递归学习法 OpenCV Nginx IndexTTS2 HuggingFace uwsgi CAM 继承 Zip Ptyhon Base64 Streamlit VPN Docker Input 签证 Plotly PyTorch AI Conda 音频 ONNX
站点统计

本站现有博文323篇,共被浏览795366

本站已经建立2493天!

热门文章
文章归档
回到顶部