EADST

GGML Q4_0 Quantize Analysis in llama.cpp

GGML Q4_0 Quantization in llama.cpp

For the LLAMA7B model, there are 387 tensors consisting of various weights and biases. These tensors include token_embd.weight, 32 sets of attention and feedforward network weights and biases (attn_norm.weigh, attn_q.weight, attn_k.weight, attn_v.weight, attn_q.bias, attn_k.bias, attn_v.bias, attn_output.weight, ffn_norm.weight, ffn_up.weight, ffn_gate.weight, ffn_down.weight), output_norm.weight, and output.weight.

Quantization Details:

  • Total Tensors for Quantization: 226
  • token_embd.weight
  • 32 sets of: attn_q.weight, attn_k.weight, attn_v.weight, attn_output.weight, ffn_up.weight, ffn_gate.weight, ffn_down.weight
  • output.weight

Tensor Breakdown:

  • llama_model_loader:
  • f32 type: 161 tensors
  • f16 type: 226 tensors
  • llama_model_quantize_internal:
  • Meta size: 6162784 bytes

Example Tensors:

  • [ 1/ 387] token_embd.weight - [ 4096, 151851, 1, 1], type = f16, quantizing to q4_0 .. size = 1186.34 MB -> 333.66 MB | hist: 0.036 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

[ 2/ 387] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 3/ 387] blk.0.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.025 0.038 0.056 0.076 0.097 0.113 0.120 0.113 0.097 0.076 0.056 0.038 0.025 0.020

  • [ 4/ 387] blk.0.attn_k.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.024 0.037 0.055 0.075 0.097 0.115 0.123 0.115 0.097 0.076 0.055 0.037 0.024 0.020

  • [ 5/ 387] blk.0.attn_v.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.016 0.025 0.039 0.056 0.076 0.096 0.112 0.119 0.112 0.096 0.076 0.056 0.039 0.025 0.021

[ 6/ 387] blk.0.attn_q.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

[ 7/ 387] blk.0.attn_k.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

[ 8/ 387] blk.0.attn_v.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 9/ 387] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 32.00 MB -> 9.00 MB | hist: 0.036 0.015 0.025 0.039 0.056 0.077 0.096 0.112 0.118 0.112 0.096 0.077 0.056 0.039 0.025 0.021

[ 10/ 387] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 11/ 387] blk.0.ffn_up.weight - [ 4096, 11008, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.037 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.116 0.111 0.096 0.077 0.057 0.039 0.025 0.021

  • [ 12/ 387] blk.0.ffn_gate.weight - [ 4096, 11008, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.037 0.016 0.026 0.039 0.057 0.077 0.096 0.110 0.116 0.110 0.096 0.077 0.057 0.040 0.026 0.021

  • [ 13/ 387] blk.0.ffn_down.weight - [11008, 4096, 1, 1], type = f16, quantizing to q4_0 .. size = 86.00 MB -> 24.19 MB | hist: 0.036 0.016 0.025 0.039 0.057 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

*...and so on for other tensors [14/ 387]-[385/ 387] *

The remaining 31 blocks follow a similar pattern. blk.0*-blk.31*

[ 386/ 387] output_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB

  • [ 387/ 387] output.weight - [ 4096, 151851, 1, 1], type = f16, quantizing to q6_K .. size = 1186.34 MB -> 486.58 MB | hist:

llama_model_quantize_internal: model size = 14727.19 MB

llama_model_quantize_internal: quant size = 4296.76 MB

llama_model_quantize_internal: hist: 0.036 0.016 0.025 0.039 0.056 0.077 0.096 0.111 0.117 0.111 0.096 0.077 0.057 0.039 0.025 0.021

Reference:

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Michelin Vmess SPIE OpenAI GPT4 Llama CUDA COCO Windows Git Hungarian 财报 Data Zip Rebuttal Quantization Land SQL Dataset Web tqdm YOLO transformers 签证 VPN Template uwsgi Distillation API Jetson CLAP Clash v0.dev CV Cloudreve Disk 递归学习法 云服务器 Tiktoken printf Baidu Random Hotel Excel Heatmap GoogLeNet Logo XGBoost RGB QWEN Python Proxy Markdown LoRA Domain 版权 git Paper TSV NameSilo Nginx Animate GIT LeetCode 关于博主 Input OpenCV SQLite GGML Hilton 飞书 MD5 Bin FP32 WAN News UNIX DeepSeek Augmentation Pillow FastAPI 图标 证件照 OCR CSV Website VGG-16 PDB Diagram 顶会 Jupyter ONNX HuggingFace Github 搞笑 LaTeX Firewall Math 音频 CC Qwen2 Tensor XML SAM CTC NLP 腾讯云 Qwen icon IndexTTS2 域名 Ptyhon Interview InvalidArgumentError Breakpoint TTS ModelScope Pickle Safetensors Vim Video git-lfs Quantize mmap BeautifulSoup 公式 FlashAttention CAM Sklearn DeepStream Use TensorRT Linux Pytorch Permission 算法题 v2ray Google torchinfo 强化学习 Algorithm FP8 FP16 图形思考法 Knowledge Attention Shortcut GPTQ BTC Search C++ TensorFlow JSON 净利润 Password diffusers Gemma Docker Paddle WebCrawler LLM PIP PyTorch hf FP64 Crawler Statistics Plate Transformers Bert Tracking PyCharm Base64 llama.cpp Django Food Card Datetime ChatGPT logger Streamlit scipy 第一性原理 Bipartite Miniforge Plotly Qwen2.5 Bitcoin Numpy AI SVR Conda tar BF16 多进程 Magnet Ubuntu Agent 阿里云 CEIR ResNet-50 RAR Anaconda LLAMA uWSGI 继承 Translation 多线程 UI Mixtral NLTK EXCEL Pandas VSCode 报税 Color Claude Review PDF HaggingFace Image2Text Freesound
站点统计

本站现有博文323篇,共被浏览796087

本站已经建立2493天!

热门文章
文章归档
回到顶部