嗨,我正在制作一个webscraping工具来解析网站,并确定它们是否是电子商务网站(包含shopify、magento等标志),我正在从csv文件输入URL。我想看一下标题,找到这个词,这是一段html或类似的东西
来自https://www.thecriticalslidesociety.com/的示例
现在我可以单独查看头部信息,但当我试图查找html字符串时,我什么也得不到
有人能告诉我下一步该做什么吗
import csv
from bs4 import BeautifulSoup as soup
import requests
import csv
from urllib.request import urlopen as uReq
import re
with open('URL.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
with open('Approval.csv', 'w') as Approval_file:
csv_writer = csv.writer(Approval_file, delimiter='_')
next(csv_reader)
for line in csv_reader:
my_URL = line[0]
Uclient = uReq(my_URL)
page_html = Uclient.read()
web_header={'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_13_2)AppleWebKit/537.36(KHTML,likeGecko)Chrome/63.0.3239.132Safari/537.36'}
Uclient.close()
page_soup = soup(page_html, "html.parser")
search = page_soup.find("head")
输出
<head>
标记:相关问题 更多 >
编程相关推荐