试图创建一个web刮板,但下面的代码一直在打印所有html

2024-04-19 19:24:08 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我试图从media.com获取我的统计数据,并且已经构建了一个机器人来登录,当我进入统计页面并试图打印标题时,它会不断向我抛出所有的html。print函数用于确保在我继续之前打印正确的内容:

url = driver.page_source
headers = {"Accept-Language": "en-US, en;q=0.5"}
results = requests.get(url, headers=headers)

soup = BeautifulSoup(url, "lxml")

story_title = []
publication = []
views = []
reads = []
read_ratio = []
fans = []

stats_div = soup.find_all('tr', class_='sortableTable-row js-statsTableRow')
for container in stats_div:
    name = container.td.a.text.find('span', class_='sortableTable-title u-maxWidth450')
    story_title.append(name)

print(story_title)

Tags: namedivurltitlecontainerstatsfindmedia