EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
Cloudreve FP64 Crawler Statistics GoogLeNet Knowledge Card LaTeX XGBoost CEIR Plotly Color Clash 腾讯云 Web printf Bipartite CAM SPIE NameSilo Animate Freesound Plate 飞书 OpenCV Breakpoint 第一性原理 GIT Llama Pickle LeetCode 音频 FlashAttention Dataset WAN 财报 Pytorch RGB DeepStream uwsgi 云服务器 Pillow Claude Datetime Michelin GPT4 llama.cpp Domain BTC Google Review Windows PIP Zip Vim RAR 算法题 transformers OpenAI YOLO Baidu 关于博主 Django Tiktoken Attention BeautifulSoup Proxy EXCEL COCO Tensor Nginx Rebuttal Input Template 图形思考法 Shortcut git-lfs uWSGI QWEN CV v0.dev AI Safetensors InvalidArgumentError Heatmap TTS Augmentation BF16 阿里云 Quantize TensorRT 搞笑 SQL Website Miniforge Magnet Logo PyTorch UNIX SAM Password CTC 继承 VPN Numpy 报税 Use Disk FP16 WebCrawler HuggingFace Excel Paper Python SQLite CC Random git Qwen2.5 IndexTTS2 Hotel Translation VGG-16 版权 图标 ResNet-50 Ptyhon Bert C++ Ubuntu 多线程 Hungarian 多进程 Quantization API LoRA GGML Vmess TensorFlow tar PDB Streamlit NLTK Conda NLP logger FP32 证件照 Qwen mmap LLAMA Hilton Jetson v2ray Mixtral Diagram Firewall 强化学习 PDF Git Image2Text Github Video 顶会 torchinfo Pandas Gemma FastAPI Data News icon Algorithm Transformers Qwen2 diffusers HaggingFace UI OCR Food Bin JSON ModelScope Anaconda scipy Tracking 公式 Bitcoin Interview tqdm MD5 Jupyter DeepSeek LLM Agent CUDA 签证 Permission XML CSV Land TSV GPTQ Markdown FP8 Base64 Paddle Distillation 域名 Docker SVR CLAP ChatGPT ONNX 递归学习法 VSCode 净利润 Math hf Search PyCharm Linux Sklearn
站点统计

本站现有博文323篇,共被浏览795590

本站已经建立2493天!

热门文章
文章归档
回到顶部