用Python从带有多个链接的网站抓取数据的最佳方法是什么？

1 投票

1 回答

36 浏览

提问于 2025-04-14 15:42

在我下面列出的例子中，这是一个关于弗吉尼亚理工大学所有校友关系章节的页面。我想进入每一个校友关系章节，并为每一条列出的信息创建一个CSV文件。我尝试使用BeautifulSoup这个工具，但没有成功。

对此话题的任何帮助都非常感谢，谢谢！

我想要抓取的数据示例

url=https://www.alumni.vt.edu/chapters/chapter_list.html


from bs4 import BeautifulSoup
import requests

website = 'https://www.alumni.vt.edu/chapters/chapter_list.html'

result = requests.get(website)
content = result.text

soup = BeautifulSoup(content, 'lxml')

print(soup.prettify())

beautifulsoup 网页解析数据抓取链接提取 csv文件校友关系

1 个回答

这里有一个例子，教你如何抓取章节列表页面中的每个链接，并从子页面获取一些信息：

import requests
from bs4 import BeautifulSoup

url = "https://www.alumni.vt.edu/chapters/chapter_list.html"

soup = BeautifulSoup(requests.get(url).content, "html.parser")

links = []
for a in soup.select(".general-body li > a"):
    links.append(a["href"])

for u in links:
    print(f"Opening {u}")
    soup = BeautifulSoup(requests.get(u).content, "html.parser")

    # get some info here:
    contact = soup.select_one(".general-body strong:-soup-contains(Contact)")
    if contact:
        c = contact.next_element.next_element
        c = c.text.strip()

        print(contact.text, c)

输出结果：

Opening https://alumni.vt.edu/chapters/chapter_list/alleghany_highlands.html
Contact: Kathleen All          
Opening https://alumni.vt.edu/chapters/chapter_list/augusta.html                                    
Contact:  augustacountyhokies@gmail.com                                                                  
Opening https://alumni.vt.edu/chapters/chapter_list/central_virginia.html       
Contact:  Sammy Paris                                                                                                                                                                                              
Opening https://alumni.vt.edu/chapters/chapter_list/charlottesville.html 
Contact: Martin Harar                                                                                    
Opening https://alumni.vt.edu/chapters/chapter_list/commonwealth.html
Contact:  Volunteers Needed             

...

回答于 2025-04-14 由 Python大师

分享举报

用Python从带有多个链接的网站抓取数据的最佳方法是什么？

1 个回答

撰写回答