Pytorch Q4_0 Quantize and Dequantize aligning with llama.cpp
作者:XD / 发表: 2023年11月13日 21:42 / 编程笔记/ 阅读量:1743
Pytorch Q4_0 Quantize and Dequantize aligning with llama.cpp
Pytorch Q4_0 Quantize and Dequantize aligning with llama.cpp
Check the KB Size of the File or Folder in the Linux
Qwen-7B-Chat模型结构注释
Update Code in Django+Nginx+uwsgi Environment
from diffusers.utils import randn_tensor ImportError: cannot import name 'randn_tensor' from 'diffusers.utils'
pip install FlashAttention
Change ModelScope Cache Folder
Download Model or Dataset from ModelScope
Redirecting a Subpath to a Port with Nginx
Redirecting to a URL Using Nginx Configuration
Cloudreve: Personal Web Disk with Docker
Install Docker on Linux
Save the LLAMA Model with LoRA to One Model
Save Hugging Face Model with One Bin
max_shard_size (int or str, optional, defaults to "10GB") — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like "5MB").
LLAMA Model Save with INT8 Format
Baidu Translation API Code