EADST

Convert PDFs to Images

Use Python to convert PDF documents into images, page by page.

from pdf2image import convert_from_path
import os

def convert_pdf_to_images(pdf_path, output_folder):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # pdf2image
    images = convert_from_path(pdf_path)

    for i, image in enumerate(images):
        image_path = os.path.join(output_folder, f"page_{i+1}.jpg")
        image.save(image_path, 'JPEG')

def process_all_pdfs(pdf_folder):
    for root, dirs, files in os.walk(pdf_folder):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                output_folder = os.path.join(root, os.path.splitext(file)[0])
                convert_pdf_to_images(pdf_path, output_folder)

pdf_folder = '/your_folder_path/'  
process_all_pdfs(pdf_folder)
相关标签
About Me
XD
Goals determine what you are going to be.
Category
标签云
Augmentation 顶会 GPT4 GoogLeNet Qwen RGB Ubuntu CC CAM CSV Logo FP64 Zip Translation Template Excel ChatGPT Base64 阿里云 logger 算法题 CUDA CEIR transformers diffusers git Nginx Password Agent 多线程 VGG-16 Google C++ Pickle FP16 Permission 域名 Data Proxy Quantization Input BTC 腾讯云 Pillow RAR uWSGI CLAP Miniforge 多进程 CTC Datetime YOLO OpenCV Pandas BF16 IndexTTS2 COCO Use Tiktoken OCR Domain BeautifulSoup Git Diagram LLM Tracking OpenAI Baidu FastAPI HaggingFace Numpy scipy Knowledge Quantize FP32 FP8 Vim Shortcut Linux 音频 Michelin Llama SQL Qwen2 Disk Bert 图标 icon Cloudreve QWEN Hungarian WAN Statistics Video uwsgi MD5 图形思考法 SVR Sklearn Paddle Docker Windows DeepStream Card Jetson torchinfo Plate Tensor EXCEL Land Clash Plotly Markdown Hotel Conda 净利润 Transformers Pytorch Search TensorFlow 飞书 报税 News Crawler LaTeX LoRA Hilton Streamlit Jupyter FlashAttention Anaconda SPIE 第一性原理 TSV Random Heatmap GGML 云服务器 PyCharm 证件照 GPTQ XML WebCrawler UI GIT 财报 Attention ONNX Safetensors ResNet-50 Image2Text v2ray Ptyhon InvalidArgumentError Firewall Magnet Freesound printf Claude 公式 ModelScope Qwen2.5 PDB Vmess 强化学习 Bin Breakpoint Algorithm Mixtral AI Interview Review Food NLTK Color JSON 签证 Gemma Python TensorRT 版权 Paper Website SQLite SAM 关于博主 tqdm v0.dev VPN NLP XGBoost Animate TTS Bitcoin Github UNIX llama.cpp tar git-lfs 继承 递归学习法 HuggingFace PyTorch Django VSCode Web Math Rebuttal LeetCode 搞笑 NameSilo DeepSeek hf PIP Dataset LLAMA Bipartite PDF CV API mmap Distillation
站点统计

本站现有博文323篇,共被浏览795913

本站已经建立2493天!

热门文章
文章归档
回到顶部