EADST

llama.cpp: Efficient 6-bit Data Packing in an 8-bit Array

This code snippet, adapted from llama.cpp by ggerganov, demonstrates a method for efficiently packing 6-bit values into an 8-bit uint8 array. It involves scaling, clamping, and bitwise manipulation to optimize or compress data, suitable for specific processing or hardware requirements.

// Initialize inverse scale factor with a fixed scaling offset and the maximum scale value.
float iscale = -32.f/max_scale;
// QK_K = 256. Iterate over a subset of the scales array, determined by QK_K divided by 16.
for (int j = 0; j < QK_K/16; ++j) {
    // Scale and round the j-th element of the scales array to the nearest integer.
    int8_t l = nearest_int(iscale * scales[j]);

    // Clamp the value of l to the range [-32, 31] and normalize it to [0, 63].
    l = MAX(-32, MIN(31, l)) + 32;

    // Store the 0-7th scale lower 4 bits of l in y[i].scales if in the first half of the loop.
    if (j < 8) {
        y[i].scales[j] = l & 0xF;
    } 
    // In the second half, store the 8-15th scale lower 4 bits of l into the higher 4 bits of y[i].scales at j-8.
    else {
        y[i].scales[j-8] |= ((l & 0xF) << 4);
    }

    // Shift the higher 4 bits of l to the lower positions.
    l >>= 4;

    // Calculate the index for storing the lower 2 bits(previous l 2 higher bits) of the shifted l and store them in y[i].scales.
    // The specific position in the array is determined by a combination of modulo and division operations.
    y[i].scales[j % 4 + 8] |= (l << (2 * (j / 4)));
}

The key aspects of this code include:

  • Scaling and Normalization: Adjusts the data values to a suitable range for bit manipulation.
  • Bitwise Operations: Utilizes masking (&), shifting (<<, >>), and bitwise OR (|=) to pack data efficiently.
  • Data Optimization: The method packs data into a smaller space, allowing for efficient use of memory and potentially faster processing.

This approach is particularly useful in scenarios where memory optimization is crucial, such as in embedded systems or when dealing with large datasets.

相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Plate LaTeX Qwen2.5 scipy Crawler RAR Dataset 阿里云 Attention Hilton Jupyter Color UNIX 签证 PIP Heatmap 财报 ChatGPT 报税 Ubuntu Cloudreve Vmess Safetensors Hotel 多线程 FP64 GIT PyTorch TensorFlow CEIR YOLO DeepStream 第一性原理 图标 uWSGI Algorithm SAM 顶会 Review Bert GoogLeNet Search 版权 FlashAttention XGBoost Land Proxy Rebuttal Jetson OpenAI 飞书 TTS RGB 云服务器 XML Numpy TensorRT Docker CLAP tqdm Clash Qwen2 git-lfs NLTK PyCharm Logo hf Web CV SVR Template Distillation Django Vim Datetime OpenCV Python Freesound InvalidArgumentError NameSilo Diagram WebCrawler Ptyhon CC 关于博主 净利润 llama.cpp Bitcoin Git Google Excel API FastAPI CTC Data torchinfo Permission tar Transformers Bin VPN logger Food BF16 Llama NLP BeautifulSoup AI ModelScope QWEN Baidu CSV transformers ONNX GPT4 VSCode LLM Nginx DeepSeek uwsgi 强化学习 Translation PDF Magnet 继承 Random Bipartite git 多进程 Card 腾讯云 Linux Augmentation 域名 TSV Statistics Input Pandas Agent FP8 Michelin Zip 音频 News Tracking Shortcut Pickle Video ResNet-50 Markdown Pillow LeetCode FP32 Math printf Conda Firewall Paddle COCO Sklearn JSON Animate 公式 LLAMA UI Base64 HaggingFace OCR SQL v0.dev HuggingFace LoRA icon Hungarian Tiktoken SQLite CAM PDB MD5 Domain BTC Image2Text Pytorch Quantize Interview GPTQ Password Gemma Qwen Anaconda IndexTTS2 Windows Claude 算法题 VGG-16 Mixtral Website Streamlit SPIE v2ray Tensor Knowledge WAN Quantization 图形思考法 搞笑 C++ Plotly 递归学习法 Disk Paper Breakpoint Miniforge mmap Github EXCEL Use 证件照 diffusers CUDA GGML FP16
站点统计

本站现有博文323篇,共被浏览795864

本站已经建立2493天!

热门文章
文章归档
回到顶部