Transformers DeepSeek V3 模型代码中文注释 modeling_deepseek_v3.py
作者:XD / 发表: 2025年4月24日 05:46 / 编程笔记/ 阅读量:37
Transformers DeepSeek V3 模型代码中文注释 modeling_deepseek_v3.py
Transformers DeepSeek V3 模型代码中文注释 modeling_deepseek_v3.py
Transformers Qwen2 模型代码中文注释 modeling_qwen2.py
Transformers Mixtral 模型代码中文注释 modular_mixtral.py
Transformers Llama 分词器代码中文注释 tokenization_llama.py
Transformers Llama 模型代码中文注释 modeling_llama.py
Transformers Llama 参数配置代码中文注释 configuration_llama.py
Print Transformers Pytorch Model Information
Transformers Demo for DeepSeek-R1-Distill-Qwen-7B
In the Hugging Face transformers library, managing large models efficiently is crucial, especially when working with limited disk space or specific file size requirements. Two key features that help with this are sharding and the use of SafeTensors.
Save Hugging Face Model with One Bin
max_shard_size (int or str, optional, defaults to "10GB") — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like "5MB").
LLAMA Model Save with INT8 Format
Check "bert-base-uncased" Model Structure