EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
Plotly MD5 Cloudreve ResNet-50 FP16 TensorFlow Proxy Claude Password Datetime Hilton Pickle Shortcut Llama Bin TTS printf InvalidArgumentError PIP NLTK 财报 AI LLM TensorRT CLAP Django 关于博主 llama.cpp Miniforge Bitcoin 证件照 阿里云 PDF Logo FP8 LeetCode News v2ray DeepSeek tqdm ONNX 报税 NLP SPIE Augmentation v0.dev OpenAI Food Github Excel GoogLeNet Linux Transformers mmap Tiktoken GPT4 Pillow 多进程 第一性原理 transformers 音频 SVR VPN FlashAttention Pytorch Freesound Tracking Knowledge Review Math Vmess Nginx WAN Attention Mixtral logger GIT 腾讯云 TSV PyCharm IndexTTS2 Git Jetson CAM Michelin Numpy ModelScope Bipartite Zip Random FP64 算法题 Input Ptyhon RGB Statistics CV Base64 Search scipy SQL Use VGG-16 SAM Video uWSGI 飞书 XGBoost Tensor 图形思考法 Sklearn uwsgi Conda FP32 diffusers Disk C++ FastAPI Gemma 公式 Distillation torchinfo Ubuntu Python Dataset SQLite LLAMA Algorithm Image2Text Qwen2.5 Template 净利润 Docker Paper 递归学习法 BF16 继承 hf Quantize Quantization 多线程 XML icon YOLO GPTQ Crawler Domain NameSilo Anaconda Breakpoint 域名 Hotel WebCrawler CEIR Markdown Plate LaTeX Card BTC BeautifulSoup Website git 顶会 Data 图标 云服务器 QWEN Firewall HaggingFace ChatGPT PDB Land Translation Jupyter CSV Qwen2 Windows OCR Streamlit UI Baidu UNIX GGML git-lfs Interview CUDA VSCode Web PyTorch DeepStream CTC 签证 tar Clash Safetensors Permission Hungarian Rebuttal COCO Color EXCEL Agent OpenCV Qwen Magnet LoRA API Diagram CC 版权 搞笑 强化学习 Bert Vim HuggingFace Google JSON Animate RAR Pandas Heatmap Paddle
站点统计

本站现有博文323篇,共被浏览795351

本站已经建立2493天!

热门文章
文章归档
回到顶部