EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
LLM Search Animate FlashAttention Miniforge MD5 Firewall Web Bitcoin Google 顶会 JSON SVR tar NLP QWEN transformers OpenCV 关于博主 mmap Numpy 搞笑 Sklearn CLAP uwsgi RAR Github Tracking Anaconda Diagram DeepStream Permission v0.dev CTC Qwen2.5 Knowledge Agent PDF Interview BTC UI Tensor GoogLeNet Conda UNIX Attention SAM Ubuntu AI Review 公式 Hotel Template v2ray Jupyter CEIR Logo XML 图标 版权 uWSGI Website Augmentation 腾讯云 财报 LLAMA Color 阿里云 FP32 git-lfs XGBoost llama.cpp PyCharm Algorithm FP8 Git diffusers 继承 GIT RGB 报税 Cloudreve 云服务器 GPT4 IndexTTS2 Proxy Rebuttal VSCode Pytorch Clash Quantization Pandas InvalidArgumentError Python Pillow Bin OCR Domain 域名 Hungarian 飞书 ONNX TensorRT Base64 NLTK Markdown GGML 证件照 Breakpoint Use Plate Dataset Paddle News Mixtral 第一性原理 Windows WebCrawler DeepSeek PIP 净利润 Distillation Ptyhon CC HuggingFace C++ Heatmap Claude WAN HaggingFace OpenAI Django EXCEL TensorFlow 多线程 BeautifulSoup Random SQL COCO TTS GPTQ TSV Math Card VGG-16 Linux Shortcut CSV 音频 torchinfo Baidu 递归学习法 Transformers git Michelin Excel ResNet-50 Image2Text Land NameSilo Datetime Safetensors Paper logger Vim Plotly hf SPIE scipy Zip YOLO CV Hilton Gemma Magnet Tiktoken Password PDB 图形思考法 LeetCode Quantize VPN Pickle Bipartite printf BF16 PyTorch Qwen2 ChatGPT FP16 算法题 CAM ModelScope 签证 LaTeX Freesound FastAPI Crawler API tqdm Food 强化学习 SQLite Data Llama FP64 Statistics Docker CUDA Qwen 多进程 icon Input Jetson Bert Nginx LoRA Disk Translation Streamlit Video Vmess
站点统计

本站现有博文323篇,共被浏览795602

本站已经建立2493天!

热门文章
文章归档
回到顶部