EADST

Extract Webpage Information with Python

Here is the python program to extract webpage information with BeautifulSoup and save the data in a CSV file.

from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

url = 'file:///Users/xd/Desktop/ieee/Region_5_Student_Branch_Counselors_and_Chairs.htm'
save_file = 'ieee_info_1'
html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")

universities = soup.find_all('div', class_='spoName bullet pad-t15')
people = soup.find_all('div', class_='roster-results')

for u, p in zip(universities, people):
    info = p.find_all('p')
    university = u.get_text()
    name = info[0].get_text()
    if name == 'Position Vacant':
        continue
    title = info[2].get_text()
    address = info[3].get_text() + ', ' + info[4].get_text()
    email = info[-1].get_text()[7:]

    content = [[university, name, title, address, email]]
    list_name = ['university', 'name', 'title', 'address', 'email']
    data = pd.DataFrame(columns=list_name, data=content)
    data.to_csv("{}.csv".format(save_file), mode='a', index=False, header=False, encoding='utf-8')
About Me
XD
Goals determine what you are going to be.
Category
标签云
Django Pandas 多线程 SAM JSON UI tqdm HuggingFace LLM 递归学习法 Ubuntu torchinfo Qwen PyCharm Crawler CEIR DeepStream 云服务器 Tracking Datetime 图形思考法 Safetensors Data 多进程 Sklearn OpenAI scipy TSV VSCode logger 签证 Vmess XGBoost 飞书 Domain API Markdown Llama Git 第一性原理 Docker FP16 YOLO IndexTTS2 继承 Input Statistics git-lfs WebCrawler icon ChatGPT GPTQ Random ResNet-50 LaTeX Bert Jupyter llama.cpp Permission 论文速读 Anaconda diffusers PyTorch Heatmap 域名 Zip 强化学习 报税 净利润 音频 CAM SQL Augmentation BF16 Cloudreve Bipartite C++ CSV uwsgi v0.dev Base64 InvalidArgumentError Proxy Windows Interview PDF Firewall GoogLeNet CTC Web RAR LoRA Disk PDB Agent Conda Math Baidu TTS Pickle Card 财报 LeetCode NameSilo Quantize Claude v2ray 公式 OpenCV 图标 Distillation FP64 VPN 论文 DeepSeek PIP Plotly Google git FlashAttention Breakpoint Translation SPIE Gemma ONNX Knowledge RGB Pillow FastAPI Streamlit mmap CUDA 算法题 Logo TensorFlow GPT4 Attention Image2Text Color VGG-16 FP32 Magnet LLAMA GGML EXCEL Hilton SVR Shortcut Numpy Plate HaggingFace AI Bitcoin tar 证件照 关于博主 Qwen2 Michelin Freesound Animate 版权 Password Search Paddle TensorRT transformers OCR Mixtral 顶会 Template BeautifulSoup Use FP8 Hungarian 腾讯云 UNIX Dataset SQLite Diagram Bin MD5 News GIT Jetson Pytorch Video Excel Quantization CV Website CLAP Land 阿里云 Transformers Paper XML Rebuttal Hotel Linux Food uWSGI COCO Github Python WAN Algorithm NLP Miniforge 搞笑 Review CC Nginx BTC printf Vim QWEN Tensor Qwen2.5 hf NLTK Clash Ptyhon ModelScope Tiktoken
站点统计

本站现有博文327篇,共被浏览833391

本站已经建立2538天!

热门文章
文章归档
回到顶部