EADST

Convert PDFs to Images

Use Python to convert PDF documents into images, page by page.

from pdf2image import convert_from_path
import os

def convert_pdf_to_images(pdf_path, output_folder):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # pdf2image
    images = convert_from_path(pdf_path)

    for i, image in enumerate(images):
        image_path = os.path.join(output_folder, f"page_{i+1}.jpg")
        image.save(image_path, 'JPEG')

def process_all_pdfs(pdf_folder):
    for root, dirs, files in os.walk(pdf_folder):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                output_folder = os.path.join(root, os.path.splitext(file)[0])
                convert_pdf_to_images(pdf_path, output_folder)

pdf_folder = '/your_folder_path/'  
process_all_pdfs(pdf_folder)
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Agent Data HuggingFace 版权 GoogLeNet git-lfs Jupyter SQLite API uWSGI 证件照 Quantize Llama Nginx Michelin Tensor GPTQ OpenCV Freesound TensorRT 算法题 CLAP PyTorch Image2Text 图形思考法 Plate YOLO Attention InvalidArgumentError Pytorch Bipartite 腾讯云 LLAMA Anaconda Shortcut Clash Statistics Sklearn TensorFlow Mixtral Plotly Zip Distillation Input Template 第一性原理 GPT4 Ptyhon Windows Diagram 搞笑 Translation Domain Qwen2 DeepStream Tiktoken Permission FP32 Excel LeetCode Pickle Bitcoin uwsgi BF16 Streamlit Website Random 报税 CEIR Github git Datetime 域名 Paddle Password Claude Color Baidu BeautifulSoup Bert COCO HaggingFace diffusers ONNX FastAPI Hilton LLM 多进程 VSCode 多线程 icon Cloudreve Base64 UI CC Use transformers NLP 强化学习 SVR Paper ModelScope QWEN Knowledge TSV 继承 XML SPIE PDF VPN BTC FP64 Transformers 递归学习法 Bin Markdown 财报 EXCEL CTC CSV Ubuntu v2ray RGB NLTK PyCharm Logo OpenAI Django Firewall Gemma NameSilo Python FP16 顶会 Safetensors 签证 Web VGG-16 SQL Land tqdm 关于博主 Git CUDA RAR Google Heatmap Video Conda Docker Crawler XGBoost IndexTTS2 FlashAttention 图标 News Breakpoint Math Dataset scipy MD5 WebCrawler Pandas CAM Vim Tracking Card GIT ChatGPT 论文速读 Disk Pillow Animate logger 音频 Jetson Rebuttal SAM C++ Hotel ResNet-50 Linux PDB LaTeX PIP llama.cpp Augmentation Qwen2.5 torchinfo GGML Quantization DeepSeek printf mmap Vmess Proxy Numpy OCR LoRA TTS Miniforge JSON Algorithm 公式 Magnet Review CV 飞书 FP8 hf Qwen Search WAN 阿里云 Interview tar 云服务器 论文 UNIX Food v0.dev Hungarian AI 净利润
站点统计

本站现有博文327篇,共被浏览833406

本站已经建立2538天!

热门文章
文章归档
回到顶部